Senior Site Reliability Engineer

Full Timesenior

Regina, Saskatchewan, CAPosted 9 weeks ago

Job Description

What You’ll Do

You’ll operate at the intersection of software engineering and systems engineering , building resilient systems that scale, self-heal, and empower developers to ship safely.

Reliability Engineering

Define and manage SLIs, SLOs, and error budgets
Reduce MTTD, MTTA, and MTTR through structured incident response
Conduct blameless postmortems and drive preventative improvements
Champion reliability in architectural reviews and production readiness

Observability & Monitoring

Design actionable, symptom-based alerts (not noise)
Build dashboards and tracing systems using tools like CloudWatch, Prometheus, Grafana, New Relic, X-Ray, ADOT
Implement synthetic monitoring to simulate real user journeys (URLs, clickpaths, APIs)
Ensure full observability coverage across critical paths

Cloud & Infrastructure

Operate and optimize AWS environments (EC2, EKS/ECS, Lambda, VPC, RDS, IAM, S3, ALB/NLB, CloudTrail)
Build resilient, multi-AZ and regionally replicated systems
Implement autoscaling and fault-tolerant architecture
Leverage Infrastructure as Code (Terraform, CDK, CloudFormation)