Site Reliability Engineer
We are a diverse group of entrepreneurs with a simple objective: make it easy to buy and understand energy, so that together, we can create a more efficient ecosystem and sustainable world.
ABOUT YOU:
Dash Engineering is looking for a Site Reliability Engineer to help lead, design, and build the next generation of our large-scale energy procurement SaaS product. In this role, you will implement modern interpretations observability, chaos Engineering and DevOps. This role is highly visible and impactful to the organization and will help shape Dash’s engineering culture for years to come.
WHERE WE NEED YOUR HELP
- Evangelize best practices for building and operating highly secure and reliable systems
- Build and manage highly reliable, redundant, secure, and scalable customer-facing infrastructure with robust observability and monitoring capabilities
- Use monitoring tools to find problems, resolve and/or escalate to development and ensure that we exceed our SLAs
- Design, manage, and maintain tools to automate infrastructure and operational process
- Proactively recommend system design improvements to meet security, reliability, and capacity requirements
- Conduct timely retrospectives of production infrastructure incidents
- Assist with all aspects of operational security and compliance
- Seek out potential threats to security and reliability and advocate solutions
- Participate in an on-call rotation to receive escalations
- Help design, manage, and support our new SaaS platform
- Participate in a 24/7 on-call rotation
- Recommend process and architecture improvements
- Oversee pre-production acceptance testing to ensure the high quality of a company’s services and products
REQUIRED QUALIFICATIONS
- 3+ years of work experience in an SRE or Infrastructure Engineer role, preferably for a SaaS product
- Deep expertise in Amazon Web Services (AWS) and Infrastructure-as-Code (IAC) tools like CloudFormation
- Know when to triage and when to dive down into a root-cause analysis
- Substantial experience with Python and JavaScript
- Passion for secure, reliable, scalable, observable software with strong sense of ownership
- Experience with Linux system administration
- Experience developing and monitoring mission-critical systems
- Experience with distributed cloud service development, infrastructure deployment, traffic management and architecture.
Benefits:
- Dental insurance
- Health insurance
- Flexible schedule
- Unlimited paid time off (PTO)
- $100 allowance for cell phone bill
- Allowance for home office set up
- We got Swag!
Schedule:
- Monday to Friday
We are 100% remote.
Job Type: Full-time