Senior Site Reliability Engineer
Devopie Inc.Job Description
What You’ll Do
You’ll operate at the intersection of software engineering and systems engineering , building resilient systems that scale, self-heal, and empower developers to ship safely.
Reliability Engineering
- Define and manage SLIs, SLOs, and error budgets
- Reduce MTTD, MTTA, and MTTR through structured incident response
- Conduct blameless postmortems and drive preventative improvements
- Champion reliability in architectural reviews and production readiness
Observability & Monitoring
- Design actionable, symptom-based alerts (not noise)
- Build dashboards and tracing systems using tools like CloudWatch, Prometheus, Grafana, New Relic, X-Ray, ADOT
- Implement synthetic monitoring to simulate real user journeys (URLs, clickpaths, APIs)
- Ensure full observability coverage across critical paths
Cloud & Infrastructure
- Operate and optimize AWS environments (EC2, EKS/ECS, Lambda, VPC, RDS, IAM, S3, ALB/NLB, CloudTrail)
- Build resilient, multi-AZ and regionally replicated systems
- Implement autoscaling and fault-tolerant architecture
- Leverage Infrastructure as Code (Terraform, CDK, CloudFormation)
About Devopie Inc.
Devopie Inc.
AI-powered job search
Get every job scored to your resume
Upload your resume and get jobs ranked, your resume tailored, and employee contacts found automatically.
Get Started FreeNo credit card to start