Job Description
Position: Principal Site Reliability Engineering Specialist (SRE)
Position
Description
Location:
Edmonton
Open to other locations within proximity to a CGI Office
Hybrid work model
We are hiring a Senior Site Reliability Engineer (SRE) with a strong foundation in building and operating reliable, scalable, and resilient cloud platforms. You bring a reliability and performance engineering mindset to everything you do—balancing operational stability with modernization and automation. In this role, you will apply core SRE practices—including SLIs/SLOs, observability, incident management, and operational automation—while temporarily supporting a regional support strategy engagement focused on assessing and strengthening large-scale operational environments.
You will work closely with platform, operations, and architecture teams to evaluate current-state practices, identify reliability and support gaps, and contribute to the definition of future-state operating models and implementation roadmaps. Beyond this engagement, the role is designed for ongoing, hands-on SRE delivery, where you will lead and implement monitoring, reliability engineering, automation, and tooling across cloud and hybrid environments.
You will collaborate with cross-functional teams to design, build, and continuously improve platform reliability, engineering standards, and operational excellence practices for mission-critical services. This position places you in a client-facing, high-impact environment, where your technical depth, operational judgment, and ability to translate reliability principles into practical outcomes will directly influence service stability, modernization efforts, and future cloud initiatives. If you are a proven SRE who thrives in complex environments and values both hands-on engineering and operational leadership, this role offers the opportunity to make a meaningful and lasting impact.
Your future duties and responsibilities:
Who are You?
You are a senior Site Reliability Engineer who thrives on solving complex reliability and operational challenges are curious, collaborative, and continuously focused on improving how platforms, infrastructure, and services are operated and supported. Your strength lies in applying sound engineering judgment to real-world operational problems, balancing reliability, performance, and maintainability. You are equally comfortable working hands-on with tools and systems and stepping back to assess how operational practices, support models, and workflows impact service reliability.
You can engage confidently in technical discussions with engineers while also communicating clearly with operational leaders and stakeholders to explain risks, trade-offs, and improvement opportunities.
With a mindset grounded in continuous improvement and learning, you champion modernization, automation, and pragmatic reliability practices. You are trusted for your ability to identify root causes rather than symptoms, to raise concerns early, and to translate reliability principles into practical, actionable outcomes. Your peers value your technical depth and calm leadership in complex environments, and teams rely on you to elevate operational maturity and execution quality.
At CGI, we recognize strong SRE practitioners and provide the environment and support for them to grow, contribute, and make a meaningful impact across engagements.
Responsibilities
- Develop, operate, and evolve monitoring, logging, and alerting capabilities across cloud and hybrid environments, while temporarily contributing SRE expertise to assess and rationalize existing operational monitoring practices as part of a regional support strategy initiative.
- Define, implement, and continuously improve SLIs, SLOs, and SLAs for platform and service reliability, applying these principles during the engagement to evaluate current-state service outcomes and inform future-state reliability targets.
- Lead and participate in incident response, problem investigation, and root cause analysis, leveraging hands-on SRE experience to identify systemic reliability issues and recurring operational failure patterns observed across regional support operations.
- Design and automate reliability and operational…
Similar Jobs
Palantir Data Engineer - 4+ Years - Pan India
Crescendo Global
DevOps Engineer - TS/SCI
Leidos
Azure DevOps Automated Manual Tester New York NY
AHU Technologies Inc
Salary $150K - Azure Build Engineer (.NET Azure DevOps) - WA
Bellatrix Systems
Zoom AI DevOps Engineer
Zoom
More Jobs at CGI
View all →Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free