Drive Reliability and Performance as a Senior Site Reliability Engineer
EPAM SystemsJob Description
Become a pivotal force in enhancing system reliability and performance for digital trading products. As a Lead Site Reliability Engineer, you’ll spearhead monitoring initiatives to ensure high availability and continuous improvement.
This role requires leadership within a team of SRE engineers, overseeing infrastructure and application performance. You'll define a strategic reliability vision while ensuring stable connectivity to external partners. Responsibilities include optimizing monitoring systems and leading incident management efforts in a high-stakes environment.
Key Responsibilities:
- Define a reliability vision for trading portfolio
- Oversee SRE team, providing mentorship and guidance
- Own SLA/SLO/SLI frameworks and service health reporting
- Configure and optimize monitoring systems
- Analyze performance and manage critical incidents
Requirements
- 8+ years in Site Reliability Engineering or DevOps
- Proven leadership experience in technical roles
- Strong experience with SLA/SLO/SLI governance
- Hands-on knowledge of Microsoft Azure environments
- Proficiency with Dynatrace in production settings
Elevate system reliability and performance through strategic initiatives, mentorship, and effective incident management in a dynamic trading environment.
#J-18808-Ljbffr
Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free