Skip to main content
Astra North Infoteck Inc. logo

Site Reliability Engineer – Dynatrace

Astra North Infoteck Inc.
Full Timemid
Toronto, Ontario, CAPosted March 13, 2026

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

KubernetesMicroservicesSaaS

Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score

Job Description

Job Description: Skills: Dynatrace, Observability, Monitoring Engineering, SRE Practices

Experience: 6-8 years

Job Description

We are seeking a highly skilled Dynatrace Monitoring Engineer / Site Reliability Engineer (SRE) responsible for designing, implementing, and maintaining observability solutions across enterprise applications and infrastructure. This role focuses on proactive monitoring, performance visibility, incident prevention, and enforcing reliability standards through service-level objectives (SLOs). The ideal candidate brings deep Dynatrace expertise along with strong troubleshooting, communication, and architectural awareness.

Key Responsibilities

Dynatrace Engineering & Monitoring

Design, configure, and maintain Dynatrace dashboards, alerting rules, and synthetic monitoring for business-critical URLs.

Build customized dashboards for:

Application Performance (APM)

Infrastructure monitoring (hosts, processes, services) Kubernetes & cloud workloads Business metrics & SLA/SLO insights

Use DQL (Dynatrace Query Language) to create advanced tiles, analytic views, and metric visualizations.

Standardize dashboards to be reusable, scalable, and aligned with business KPIs.

Observability & SRE Practices

Define and manage Service Level Objectives (SLOs) to measure availability, reliability, and operational performance.

Exercise key SRE decision rights (e.g., rejecting operationally substandard software, advising developers on improvements).

Implement observability requirements ensuring systems meet expected service levels with proper operational characteristics.

Focus on reliability, scalability, and performance of production computing systems, including complex distributed systems.

Develop observability standards that ensure predictable system behavior and early detection of errors or failures.

Incident Management & Problem Resolution

Conduct root cause analysis (RCA) through post‑mortem reviews, ensuring permanent remediation and preventing recurrence.

Provide strong troubleshooting for application, infrastructure, and integration-level monitoring issues.

Integrate Dynatrace and monitoring workflows with ITSM platforms.

Cross‑Functional Collaboration

Work closely with infrastructure, application, cloud, and security teams to ensure seamless operational monitoring.

Lead or contribute to enterprise-wide initiatives as a subject matter expert.

Interact with governance, audit, compliance, and risk groups to provide observability insights and ensure adherence to standards.

Identify emerging technologies and propose innovative enhancements to monitoring and reliability engineering practices.

Essential Skills

Strong hands-on experience with Dynatrace SaaS/Managed, including dashboard creation, alert configuration, and synthetic monitoring.

Strong understanding of APM concepts, infrastructure monitoring, cloud monitoring, and (preferably) Kubernetes/microservices environments.

Familiarity with DQL, metrics, entity models, and relationships within Dynatrace.

Experience integrating Dynatrace or similar monitoring tools with ITSM systems.

Excellent troubleshooting and communication skills.

Strong foundation in networking, reliability engineering, scalability, and cloud operational characteristics.

Ability to drive SRE practices such as:

SLO creation

Release readiness assessments

Operational risk evaluation

Continuous improvement through automation and observability standards

Want AI-powered job matching?

Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.

Get Started Free