Senior Site Reliability Engineer

Devopie Inc.

Full Timesenior

Hamilton, Ontario, CAPosted 6 weeks ago

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonJavaGoRubyBashAWSDockerKubernetesTerraformLinuxUnixPostgreSQLMongoDBRedisGitHubRabbitMQCI/CDSaaS

Job Description

💡 What You’ll Do

You’ll operate at the intersection of software engineering and systems engineering , building resilient systems that scale, self-heal, and empower developers to ship safely.

🔎 Reliability Engineering

Define and manage SLIs, SLOs, and error budgets
Reduce MTTD, MTTA, and MTTR through structured incident response
Conduct blameless postmortems and drive preventative improvements
Champion reliability in architectural reviews and production readiness

📊 Observability & Monitoring

Design actionable, symptom-based alerts (not noise)
Build dashboards and tracing systems using tools like CloudWatch, Prometheus, Grafana, New Relic, X-Ray, ADOT
Implement synthetic monitoring to simulate real user journeys (URLs, clickpaths, APIs)
Ensure full observability coverage across critical paths

☁️ Cloud & Infrastructure

Operate and optimize AWS environments (EC2, EKS/ECS, Lambda, VPC, RDS, IAM, S3, ALB/NLB, CloudTrail)
Build resilient, multi-AZ and regionally replicated systems
Implement autoscaling and fault-tolerant architecture
Leverage Infrastructure as Code (Terraform, CDK, CloudFormation)

🤖 Automation & Toil Reduction

Eliminate manual processes through automation
Build self-healing infrastructure
Improve CI/CD pipelines with safe deployment strategies (canary releases, feature flags)
Write production-quality code (not just scripts) in Python, Go, Ruby, Bash, or Java

📈 Performance & Capacity Planning