Site Reliability Engineer (Production Support & Incident Management)

Full Timemid Hybrid

Toronto, Ontario, CAPosted March 11, 2026

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

ShellSpringUnixApacheMongoDBKafkaSparkAgileScrumCI/CDDevOpsMicroservicesAPI

Title: Site Reliability Engineer (Production Support & Incident Management)

Role: Site Reliability Engineer

Location: Toronto

Work Mode: Hybrid (4 Days WFO)

Primary Skills: Production Support and Incident Management

Experience Required: 6-8 years

Must Have:

Proven experience in site reliability engineering, software engineering, and system administration.
Familiarity with distributed platforms, mainframe systems, and CRM applications.

Hands-on experience in a variety of tools and languages, such as:

DevOps CI/CD
Dynatrace
Splunk
PagerDuty
ServiceNow
Software engineer experience with production class delivery, strong analytical mindset, communication skills, and sense of ownership/drive.
Intermediate experience in a variety of environments and platforms, such as:
Cloud
Distributed
Business workflows and services/APIs
Mainframe – JCL, Cobol, DB2
UNIX
KAFKA

Emerging technologies experience, such as:

Shell scripting: the ability to read, understand, modify, and write non-trivial UNIX shell scripts is required.

Knowledge of leading software using a cloud-native stack, such as Spring Boot, Spring Cloud, Cloud Foundry and OpenShift.
Working experience in one or more of:
Algorithm design and optimization
Large-scale systems and/or parallel or distributed systems
Web API
MongoDB
RDBMS and/or modern scale-out databases
Experience with Agile (SCRUM) methodology