Skip to main content
EPAM Systems, Inc. logo

Lead Site Reliability Engineer

EPAM Systems, Inc.
Full Timelead
CAPosted February 18, 2026

Job Description

Join our team as a Lead Site Reliability Engineer to drive system reliability, observability, and performance monitoring for mission-critical digital trading products.

You will lead monitoring initiatives in a high-availability trading environment, ensuring stable connectivity to external partners while proactively identifying opportunities for continuous improvement. At EPAM, you'll work on cutting-edge technologies, solve complex challenges, and shape the future of digital innovation. With access to continuous learning, mentorship, and global projects, your expertise will drive meaningful change.

The recruiting efforts for this position are intended to fill an existing vacancy for a new position.

Req# 968473077

Responsibilities

  • Define and implement a strategic reliability vision for the trading portfolio, covering infrastructure, network connectivity, application performance, and throughput
  • Lead and oversee a team of SRE engineers, providing technical direction, mentorship, and performance guidance
  • Own and evolve the SLA/SLO/SLI framework, including error budgets and service health reporting
  • Configure and optimize comprehensive monitoring and alerting systems across infrastructure and applications
  • Drive observability best practices using APM and monitoring platforms (e.g., Dynatrace)
  • Analyze application and infrastructure performance to isolate fault domains and determine root causes of critical incidents
  • Lead major incident management, coordinate resolution efforts, and conduct blameless postmortems
  • Participate in 24x7x365 support rotation and ensure operational excellence across the team
  • Identify automation opportunities to improve reliability, scalability, and operational efficiency

Requirements

  • 8+ years of experience in Site Reliability Engineering, DevOps, or Production Engineering
  • Proven leadership experience (technical lead or team lead), with ability to oversee and mentor engineers
  • Strong hands-on experience with SLA/SLO/SLI definition, governance, and reporting
  • Solid experience working in Microsoft Azure environments (IaaS, PaaS, networking, monitoring)
  • Hands-on experience with Dynatrace (configuration, alerting, dashboards, performance analysis)
  • Experience with observability, monitoring, and APM tools in production environments
  • Ability to operate effectively under pressure in time-sensitive, high-impact environments

We offer

  • Extended Healthcare with Prescription Drugs, Dental and Vision, and Healthcare Spending Account (Company Paid)
  • Life and AD&D Insurance (Company Paid)
  • Employee Assistance Program (Company Paid)
  • Telehealth (Company Paid)
  • Short-term Disability (Company Paid)
  • Long-Term Disability
  • Paid Time Off (including vacation and sick days)
  • Registered Retirement Savings Plan (RRSP) with Company match
  • Maternity/Parental/Adoption Leave Top-up
  • Employee Stock Purchase Program
  • Critical Illness Insurance
  • Employee Discounts
  • Unlimited access to LinkedIn learning solutions

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our clients, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

Engineer the Future with a Career at EPAM

This posting includes a base salary range EPAM Canada would reasonably expect to pay the selected candidate. Individual compensation offers within the range are based on a variety of factors, including, but not limited to, experience, credentials, education, training, the demand for the role, skillset, and overall business and local labour market considerations. Most candidates are hired at a salary within the range disclosed. Salary range: CA$140,000-CA$178,000. In addition, the details highlighted in this job posting above are a general description of all other expected benefits and compensation for the position.

EPAM Canada welcomes and encourages applications from candidates with disabilities. Please contact WFA Human Resource CA WFAHRCA@epam.com if you have questions in this regard, or if you require an accommodation to complete the application process. Click here to review EPAM’s Accessibility for Ontarians with Disabilities Accessibility Policies and Multi-Year Access.

An artificial intelligence system is software that is developed with one or more techniques that can, for a given set of human-defined objectives, using algorithmic information processing, generate outputs such as content, predictions, recommendations, or decisions with varying levels of autonomy (“AI”). Tasks that humans have traditionally done by thinking and reasoning a

Want AI-powered job matching?

Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.

Get Started Free