Skip to main content
Devopie Inc. logo

Senior Site Reliability Engineer

Devopie Inc.
Full Timesenior
Hamilton, Ontario, CAPosted 6 weeks ago

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonJavaGoRubyBashAWSDockerKubernetesTerraformLinuxUnixPostgreSQLMongoDBRedisGitHubRabbitMQCI/CDSaaS

Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score

Job Description

💡 What You’ll Do

You’ll operate at the intersection of software engineering and systems engineering , building resilient systems that scale, self-heal, and empower developers to ship safely.

🔎 Reliability Engineering

  • Define and manage SLIs, SLOs, and error budgets
  • Reduce MTTD, MTTA, and MTTR through structured incident response
  • Conduct blameless postmortems and drive preventative improvements
  • Champion reliability in architectural reviews and production readiness

📊 Observability & Monitoring

  • Design actionable, symptom-based alerts (not noise)
  • Build dashboards and tracing systems using tools like CloudWatch, Prometheus, Grafana, New Relic, X-Ray, ADOT
  • Implement synthetic monitoring to simulate real user journeys (URLs, clickpaths, APIs)
  • Ensure full observability coverage across critical paths

☁️ Cloud & Infrastructure

  • Operate and optimize AWS environments (EC2, EKS/ECS, Lambda, VPC, RDS, IAM, S3, ALB/NLB, CloudTrail)
  • Build resilient, multi-AZ and regionally replicated systems
  • Implement autoscaling and fault-tolerant architecture
  • Leverage Infrastructure as Code (Terraform, CDK, CloudFormation)

🤖 Automation & Toil Reduction

  • Eliminate manual processes through automation
  • Build self-healing infrastructure
  • Improve CI/CD pipelines with safe deployment strategies (canary releases, feature flags)
  • Write production-quality code (not just scripts) in Python, Go, Ruby, Bash, or Java

📈 Performance & Capacity Planning

  • Analyze system metrics and traffic patterns
  • Conduct load testing, chaos testing, and capacity modeling
  • Identify bottlenecks and proactively optimize systems

🤝 Cross-Functional Collaboration

You’ll work closely with:

  • Engineering & Platform teams on scalable system design
  • Security teams on IAM, KMS, GuardDuty, secrets management
  • Product leaders to align reliability with roadmap priorities
  • Cloud vendors and SaaS providers during critical incidents

🧠 What You Bring

Must-Have Experience

  • Bachelor’s degree in Computer Science, Software Engineering, or related field
  • Strong Linux/Unix systems knowledge
  • Deep AWS experience
  • Hands-on Kubernetes (EKS/ECS), Docker, and container orchestration
  • Infrastructure as Code (Terraform, CDK, CloudFormation)
  • Production on-call and incident management experience
  • Strong understanding of MTTx metrics (MTTD, MTTR, MTBF, etc.)
  • Experience with MongoDB, PostgreSQL, Redis, RabbitMQ
  • Experience with observability and monitoring platforms
  • CI/CD pipeline experience (GitHub, Kubernetes, etc.)

Nice-to-Have

  • Performance engineering and chaos testing
  • Experience in fintech or regulated environments
  • Knowledge of distributed storage systems (NFS, HDFS, Ceph, S3)
  • Familiarity with dynamic resource frameworks (Kubernetes, Mesos, Yarn)

About Devopie Inc.

Devopie Inc. logo

Devopie Inc.

DevopsOn-site

Want AI-powered job matching?

Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.

Get Started Free