Skip to main content
EPAM Systems logo

Drive Reliability and Performance as a Senior Site Reliability Engineer

EPAM Systems
Be an Early ApplicantFull Timesenior
CAPosted April 7, 2026

Job Description

Become a pivotal force in enhancing system reliability and performance for digital trading products. As a Lead Site Reliability Engineer, you’ll spearhead monitoring initiatives to ensure high availability and continuous improvement.

This role requires leadership within a team of SRE engineers, overseeing infrastructure and application performance. You'll define a strategic reliability vision while ensuring stable connectivity to external partners. Responsibilities include optimizing monitoring systems and leading incident management efforts in a high-stakes environment.

Key Responsibilities:

  • Define a reliability vision for trading portfolio
  • Oversee SRE team, providing mentorship and guidance
  • Own SLA/SLO/SLI frameworks and service health reporting
  • Configure and optimize monitoring systems
  • Analyze performance and manage critical incidents

Requirements

  • 8+ years in Site Reliability Engineering or DevOps
  • Proven leadership experience in technical roles
  • Strong experience with SLA/SLO/SLI governance
  • Hands-on knowledge of Microsoft Azure environments
  • Proficiency with Dynatrace in production settings

Elevate system reliability and performance through strategic initiatives, mentorship, and effective incident management in a dynamic trading environment.

#J-18808-Ljbffr

Want AI-powered job matching?

Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.

Get Started Free