Site Reliability Engineering Lead

Mississauga, Ontario, CA$121k – $171kPosted 7 weeks ago

Role Overview

Professional is hiring a Site Reliability Engineering Lead. This is a full-time hybrid role, based in Mississauga. Part of Professional's Devops hiring. The posted range is $121k to $171k. Full responsibilities, required qualifications, and the apply link are listed in the description below.

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonJavaGoKubernetesMongoDBRedisCI/CDDevOps

Job Description

Description

Engineer the future of global finance.

At Citi, our Tech team doesn’t just support finance – we are helping to redefine it. Every day, $5 trillion crosses through our network. We do business in 180+ countries, operating at a scale few can match. From deploying advanced AI to helping shape global markets, we build systems that matter. Look to join a team where your work helps influence economies, your ideas can drive innovation and outcomes, and your growth is backed by mentorship, continuous learning, and flexibility with potential hybrid work opportunities. Help solve real‑world challenges that touch millions and get the opportunity to build the future of finance with Citi Tech.

We are seeking an experienced and motivated team member to support our AI and DevOps Platform Support team in North America. This role is responsible for contributing to the stability, reliability, and performance of our critical AI and DevOps platforms. The team supports a wide range of services, including multiple AI applications, developer tools, and CI/CD pipeline technologies used across the organization. The ideal candidate will help lead a team of SRE and Support engineers, facilitate incident and problem resolution, and collaborate with engineering and development teams to enhance platform services and supportability. The role includes short‑term planning and coordination of actions and resources within the team.

Responsibilities

Demonstrates a strong understanding of how application support contributes to the overall technology function and organizational objectives.
Assist with vendor relationship management, including coordination with offshore managed services.
Support efforts to improve service levels for end users by enhancing operational efficiencies and strengthening incident management, problem management, and knowledge‑sharing practices.
Partner with development teams to guide improvements in application stability and supportability.
Contribute to frameworks for managing capacity, throughput, and latency.
Assist in defining and implementing application onboarding guidelines and standards.
Support team members by fostering a collaborative environment and encouraging skill development.
Participate in cost‑reduction efforts through Root Cause Analysis reviews, knowledge management, performance tuning, and user training.
Participate in business review meetings to help align technology tools and strategies with business requirements.
Ensure adherence to support processes and tool standards, and assist in enhancing processes to promote consistency and quality across the support program.
Perform other duties and functions as assigned.
Support platform leadership in defining the platform roadmap and partnering with engineering teams and business stakeholders.
Assist in executing resilience activities such as wargaming scenarios, chaos engineering tests, and disaster recovery drills.
Contribute to automation initiatives aimed at reducing manual toil and improving platform efficiency.
Support the enterprise‑wide observability strategy, including monitoring, logging, tracing, and alerting.
Maintain hands‑on familiarity with platform architecture and services as needed for operational support.
Assist in overseeing the operational health of production platforms (including OpenShift, ECS, CI/CD), ensuring SLAs are supported and incident processes are followed.
Help implement and operate effective monitoring and observability strategies to support proactive issue detection and system health assessments.

Qualifications

6–10 years of relevant experience in a hands‑on technical role.
Experience contributing to architecture discussions and ensuring solutions align with enterprise standards and long‑term maintainability.
Experience working with senior stakeholders or technology partners.
Demonstrated experience supporting IT service improvements or platform stability initiatives.
Strong communication and presentation skills, with the ability to convey technical concepts clearly.
Experience supporting or contributing to technical roadmaps or operational workstreams.
Experience participating in resilience‑related activities such as incident simulations, disaster recovery exercises, or stability testing.
Ability to collaborate with cross‑functional support teams and technology groups.
Strong organizational and workload‑planning skills.
Consistently demonstrates clear and concise written and verbal communication skills.
Ability to communicate appropriately with relevant stakeholders.
Working knowledge of Generative AI concepts preferred.
Experience with CI/CD and configuration management tools preferred.
Experience with Red Hat OpenShift or similar Kubernetes technologies preferred.
Experience working with databases such as Postgres, Oracle, MongoDB, or Redis preferred.
Experience writing or maintaining code in Java, Python, Go, or similar languages preferred.
Hands‑on experience with modern observability and monitoring tools (e.g., Prometheus, Grafana, Splunk, ELK) preferred.

Education

Bachelor’s/University degree required; Master’s degree preferred.

------------------------------------------------------

Job Family Group:

Technology

------------------------------------------------------

Job Family:

Applications Support

------------------------------------------------------

Time Type:

Full time

------------------------------------------------------

Primary Location Full Time Salary Range:

$120,800.00 - $170,800.00

------------------------------------------------------

Most Relevant Skills

Please see the requirements listed above.

------------------------------------------------------

Other Relevant Skills

For complementary skills, please see above and/or contact the recruiter.

------------------------------------------------------

Automated Processing and AI

We use automated processing, including artificial intelligence, for our legitimate business interests (or our reasonable and appropriate business purposes) to identify and align the candidate's skills and abilities with a specific job opening. Additionally, if you so choose, or consent, we can match your skills and abilities to other suitable roles at Citi.

Importantly, all our hiring processes and decisions, including determining your suitability for a role, are conducted, checked, and decided by individuals. Our automated processing and AI do not involve relying on automatic or autonomous decision-making. Please refer to any Jurisdictional Considerations, with specific provisions for your country (where relevant) for further details.

------------------------------------------------------

This job opening is for an existing job vacancy.

------------------------------------------------------

Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.

If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.

View Citi’s EEO Policy Statement and the Know Your Rights poster.

Frequently Asked Questions

How do I apply for the Site Reliability Engineering Lead position at Professional?

Use the Apply button above to submit your application directly to Professional. Most applications take less than 5 minutes if your resume and contact details are ready, and you'll be routed to the employer's official application system to finish.

Is the Site Reliability Engineering Lead role at Professional remote or in-office?

This is a hybrid role based in Mississauga. Expect a mix of in-office and remote days, with the specific cadence set by the hiring manager.

How much does the Site Reliability Engineering Lead role at Professional pay?

Professional has posted a compensation range of $121k to $171k for this position. Final offers typically vary based on candidate experience, location, and internal salary bands.

When was the Site Reliability Engineering Lead role at Professional posted?

This role was posted on April 26, 2026 (53 days ago). It's still listed as actively hiring; we re-confirm openings against the source system multiple times per day and remove closed roles.

How much experience does the Site Reliability Engineering Lead role at Professional require?

This is a senior-level position. Most senior roles call for 5+ years of directly relevant experience. Professional lists their specific requirements in the description below, so review the must-have qualifications closely before applying.

Browse Remote DevOps Engineer Jobs →

AI-powered job search

Get every job scored to your resume

Upload your resume and get jobs ranked, your resume tailored, and employee contacts found automatically.

Get Started Free

No credit card to start

Site Reliability Engineering Lead

Professional

Full Timesenior Hybrid

Mississauga, Ontario, CA$121k – $171kPosted 7 weeks ago

Role Overview

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonJavaGoKubernetesMongoDBRedisCI/CDDevOps

Job Description

Description

Engineer the future of global finance.

Responsibilities

Demonstrates a strong understanding of how application support contributes to the overall technology function and organizational objectives.
Assist with vendor relationship management, including coordination with offshore managed services.
Support efforts to improve service levels for end users by enhancing operational efficiencies and strengthening incident management, problem management, and knowledge‑sharing practices.
Partner with development teams to guide improvements in application stability and supportability.
Contribute to frameworks for managing capacity, throughput, and latency.
Assist in defining and implementing application onboarding guidelines and standards.
Support team members by fostering a collaborative environment and encouraging skill development.
Participate in cost‑reduction efforts through Root Cause Analysis reviews, knowledge management, performance tuning, and user training.
Participate in business review meetings to help align technology tools and strategies with business requirements.
Ensure adherence to support processes and tool standards, and assist in enhancing processes to promote consistency and quality across the support program.
Perform other duties and functions as assigned.
Support platform leadership in defining the platform roadmap and partnering with engineering teams and business stakeholders.
Assist in executing resilience activities such as wargaming scenarios, chaos engineering tests, and disaster recovery drills.
Contribute to automation initiatives aimed at reducing manual toil and improving platform efficiency.
Support the enterprise‑wide observability strategy, including monitoring, logging, tracing, and alerting.
Maintain hands‑on familiarity with platform architecture and services as needed for operational support.
Assist in overseeing the operational health of production platforms (including OpenShift, ECS, CI/CD), ensuring SLAs are supported and incident processes are followed.
Help implement and operate effective monitoring and observability strategies to support proactive issue detection and system health assessments.

Qualifications

6–10 years of relevant experience in a hands‑on technical role.
Experience contributing to architecture discussions and ensuring solutions align with enterprise standards and long‑term maintainability.
Experience working with senior stakeholders or technology partners.
Demonstrated experience supporting IT service improvements or platform stability initiatives.
Strong communication and presentation skills, with the ability to convey technical concepts clearly.
Experience supporting or contributing to technical roadmaps or operational workstreams.
Experience participating in resilience‑related activities such as incident simulations, disaster recovery exercises, or stability testing.
Ability to collaborate with cross‑functional support teams and technology groups.
Strong organizational and workload‑planning skills.
Consistently demonstrates clear and concise written and verbal communication skills.
Ability to communicate appropriately with relevant stakeholders.
Working knowledge of Generative AI concepts preferred.
Experience with CI/CD and configuration management tools preferred.
Experience with Red Hat OpenShift or similar Kubernetes technologies preferred.
Experience working with databases such as Postgres, Oracle, MongoDB, or Redis preferred.
Experience writing or maintaining code in Java, Python, Go, or similar languages preferred.
Hands‑on experience with modern observability and monitoring tools (e.g., Prometheus, Grafana, Splunk, ELK) preferred.

Education

Bachelor’s/University degree required; Master’s degree preferred.