Skip to main content
Snapmint logo

Snapmint - Engineering Manager - Site Reliability

Snapmint
Full Timemanager
Haryana, INPosted March 10, 2026

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonJavaGoAWSGCPAzureDockerKubernetesTerraformCI/CDDevOps

Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score

Job Description

Description

We are seeking an experienced Engineering Manager SRE to lead our Site Reliability Engineering team. This role combines technical leadership, people management, and operational excellence to ensure our systems are reliable, scalable, secure, and highly available. You will drive reliability strategy, improve operational processes, and build a high-performing SRE team.

Title : EM- SRE

Experience : 8-12 Years

Work Location : Gurgaon (Unitech Cyber Park, Sector 39)

Working Arrangement : 5 days (WFO)

Key Responsibilities

Leadership & Team Management :

  • Build, mentor, and manage a high-performing SRE team.
  • Set clear goals, conduct performance reviews, and support career growth.
  • Foster a culture of reliability, automation, and continuous improvement. Collaborate cross-functionally with Engineering, Product, Security, and DevOps teams.

Reliability & Operations

  • Define and manage SLIs, SLOs, and error budgets.
  • Ensure system reliability, performance, scalability, and availability.
  • Lead incident management, root cause analysis (RCA), and postmortems. Drive improvements in observability, monitoring, and alerting systems.

Infrastructure & Automation

  • Oversee cloud infrastructure (AWS/GCP/Azure) and on-prem environments.
  • Promote infrastructure-as-code (Terraform, CloudFormation, etc.).
  • Drive automation to reduce toil and improve system efficiency. Improve CI/CD pipelines and deployment reliability.

Strategy & Execution

  • Develop and execute SRE roadmap aligned with business objectives.
  • Improve system resilience and disaster recovery processes.
  • Ensure compliance with security and regulatory requirements. Track and report reliability metrics to leadership.

Required Qualifications

  • 8+ years of experience in software engineering, DevOps, or SRE.
  • 2+ years of engineering management experience.
  • Strong expertise in cloud platforms (AWS/GCP/Azure).
  • Deep understanding of distributed systems and system architecture.
  • Experience with monitoring tools (Datadog, Prometheus, Grafana, New Relic, etc.).
  • Proficiency in at least one programming language (Python, Go, Java, etc.).
  • Experience with containerization and orchestration (Docker, Kubernetes). Strong incident management and production operations experience

(ref:hirist.tech)

Want AI-powered job matching?

Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.

Get Started Free