Site Reliability Engineer- AWS
KTek ResourcingResume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score
Job Description
Job Title: SRE – AWS
Location: Hybrid – Toronto (GTA), EST
Role Overview
Reliability, resiliency, and operational excellence for mission-critical AWS serverless platforms, ensuring high availability, low MTTR, and strong production governance using Dynatrace-driven observability.
Key Focus Areas
- Resiliency strategy for serverless architectures (Lambda, API Gateway, async/event-driven systems)
- SLOs / SLIs / Error Budgets for critical APIs
- Incident analysis and post-incident reviews
- Dynatrace observability: dashboards, alert tuning, dependency mapping, RCA acceleration
- Operational excellence improvements: incident reduction, MTTR improvement, toil automation
- Reliability guardrails embedded into CI/CD and production readiness reviews
Core Responsibilities
- Design & enforce resiliency patterns: timeouts, retries, circuit breakers, throttling, graceful degradation
- Lead major incidents and drive actionable RCAs with sustained fixes
- Build signal-driven alerts aligned to SLOs (noise reduction focus)
- Enable automation & self-healing where feasible
Required Experience
- 5–6+ years in SRE / DevOps / Production Engineering
- Deep hands-on experience with AWS serverless (Lambda, API Gateway, SQS/SNS, DynamoDB/RDS)
- Strong expertise in Dynatrace for serverless monitoring & triage
- Proven success improving availability, MTTR, and incident trends
- Solid coding/scripting (Python / Java / Node.js)
About KTek Resourcing
KTek Resourcing
Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free