Site Reliability Engineer (SRE) – Observability
Astra North Infoteck Inc.Resume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score
Job Description
Job Description: Site Reliability Engineer (SRE) – Observability
Toronto - Hybrid (1-2 days office)
Role Summary
We are looking for a Observability Engineer to help implement, operate, and improve observability capabilities across our applications and platforms. This role focuses on hands-on onboarding, instrumentation, dashboarding, and alerting, working under established standards and guidance from senior engineers.
You will collaborate with application, SRE, and operations teams to ensure systems are observable, supportable, and production-ready.
Key Responsibilities
Observability Implementation
- Implement and maintain metrics, logs, and traces for applications and infrastructure
- Assist with onboarding applications into observability platforms (e.g., Dynatrace, ELK, Datadog)
- Configure dashboards, alerts, and basic anomaly detection Application Support & Instrumentation
- Work with development teams to enable structured logging, basic distributed tracing, and core metrics
- Validate observability requirements during Production Readiness Reviews (PRR)
- Troubleshoot missing or low-quality telemetry Monitoring & Alerting
- Configure alerts based on golden signals (latency, errors, traffic, saturation)
- Help reduce alert noise by tuning thresholds and alert logic
- Support incident response by gathering logs, metrics, and traces Operations & Reliability
- Support root cause analysis using observability tools
- Maintain dashboards and documentation used by on-call and support teams
- Participate in on-call rotations (as applicable) Automation & Continuous Improvement
- Assist in automating observability onboarding and validation tasks
- Create and maintain reusable dashboards and alert templates
- Follow established observability standards and best practices Required Qualifications
- 2–4 years of experience in Observability, or SRE
- Working knowledge of metrics, logs, and basic tracing concepts
- Hands-on experience with at least one observability platform (Dynatrace, Elastic/ELK, Datadog, New Relic, etc.)
- Basic understanding of SLIs/SLOs and service health indicators
- Experience with cloud platforms or hybrid environments
- Ability to write scripts (Python, Bash, PowerShell) for automation and troubleshooting
Preferred Qualifications
- Experience with OpenTelemetry or APM agents
- Familiarity with Kubernetes or containerized workloads
- Experience working with incident management tools (PagerDuty, ServiceNow)
- Exposure to Dynatrace/Kibana ELK or similar cloud-native monitoring
- Experience in regulated or enterprise environments
Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free