Lead site reliability engineer

Concentrix

Full Timelead

Anand, Gujarat, INPosted April 6, 2026

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonGoBashAWSGCPAzureKubernetesTerraformGitCI/CD

Job Description

Lead Site Reliability Engineer

About the Role

As a Lead Site Reliability Engineer, you will own the reliability and availability of our production systems. You will champion SRE principles across engineering teams — defining SLOs, managing error budgets, and leading a culture of blameless incident response. This is a hands-on leadership role where you will partner closely with product and engineering teams to balance the pace of innovation with the stability our customers depend on.

Key Responsibilities

Reliability Ownership

· Define, implement, and own Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets across critical services.

· Use error budget policies to drive data-informed conversations between engineering and product on release velocity vs. reliability trade-offs.

· Conduct capacity planning and proactive risk assessments to prevent incidents before they occur.

Incident Management

· Lead incident response as incident commander — coordinating teams, driving resolution, and maintaining clear stakeholder communication during outages.

· Facilitate thorough, blameless postmortems and ensure action items are tracked, prioritized, and resolved.

· Develop and continuously improve runbooks, escalation paths, and on-call practices to reduce MTTD and MTTR.

Observability & Monitoring

· Design and maintain observability strategies using modern tooling (Prometheus, Grafana, Open Telemetry, ELK) to ensure full visibility into system health.

· Define intelligent alerting that is actionable and minimizes alert fatigue.

· Drive adoption of distributed tracing and structured logging across services.

Toil Reduction & Automation

· Identify and measure toil across the engineering organization and lead initiatives to eliminate it through automation.

· Build internal tooling and self-service capabilities that improve developer productivity and system reliability.

Infrastructure & Platform Reliability

· Collaborate with platform and infrastructure teams on cloud-native patterns for fault tolerance, auto-scaling, and disaster recovery.

· Provide SRE input into CI/CD pipelines and deployment strategies (e.g., canary releases, blue/green deployments) to minimize production risk.

· Manage infrastructure using Ia C practices (Terraform or equivalent) with a focus on reliability and consistency.

Leadership & Culture

· Mentor and grow junior SREs, fostering a culture of ownership, curiosity, and continuous improvement.

· Act as an SRE advocate across engineering — embedding reliability thinking into the software development lifecycle.

· Partner with key stakeholders to align SRE strategy with broader organizational goals.

· Conduct regular 1:1s with direct reports and participate in team rituals.

AI Expectations

As with all engineers at our organization, this role requires an AI-native mindset. Specifically, you will be expected to:

· Embed AI tools and practices into how we build and run our platform — deploying AI-powered capabilities and shipping real AI features into production.

· Support engagement and solutioning for AI-powered offerings, translating technical capabilities into tangible business value.

· Collaborate with cross-functional partners — including Product, Data, Security, and Legal — to ensure AI is delivered safely, effectively, and in compliance with relevant standards.