Senior Site Reliability Engineer
Devopie Inc.Alberta, CAPosted March 20, 2026
Job Description
What You’ll Do
You’ll operate at the intersection of software engineering and systems engineering , building resilient systems that scale, self-heal, and empower developers to ship safely.
Reliability Engineering
- Define and manage SLIs, SLOs, and error budgets
- Reduce MTTD, MTTA, and MTTR through structured incident response
- Conduct blameless postmortems and drive preventative improvements
- Champion reliability in architectural reviews and production readiness
Observability & Monitoring
- Design actionable, symptom-based alerts (not noise)
- Build dashboards and tracing systems using tools like CloudWatch, Prometheus, Grafana, New Relic, X-Ray, ADOT
- Implement synthetic monitoring to simulate real user journeys (URLs, clickpaths, APIs)
- Ensure full observability coverage across critical paths
Cloud & Infrastructure
- Operate and optimize AWS environments (EC2, EKS/ECS, Lambda, VPC, RDS, IAM, S3, ALB/NLB, CloudTrail)
- Build resilient, multi-AZ and regionally replicated systems
- Implement autoscaling and fault-tolerant architecture
- Leverage Infrastructure as Code (Terraform, CDK, CloudFormation)
Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free