Site Reliability Engineering Lead
Arting Digital Private LimitedResume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score
Job Description
Posting title: Site Reliability Engineering Lead
Experience: 7+ Years
Location: Chennai
Work mode: On-site
Primary skills: Terraform, Ansible, AWS services, Docker, Kubernetes, CI/CD, Datadog/ CloudWatch/ Prometheus, Python/Bash, SRE/ DevOps
Qualification: B.Tech / B.E. in Computer Science or MCA / M.Tech
Role Overview
We are looking for an experienced Senior Site Reliability Engineer (SRE) / DevOps Lead to design, build, and maintain highly scalable, reliable, and secure cloud infrastructure.
Key Responsibilities
- Design, implement, and manage scalable, secure, and highly available cloud infrastructure on AWS.
- Lead and mentor a team of engineers, fostering best practices in SRE and DevOps.
- Build and manage Infrastructure-as-Code (IaC) using tools like Terraform, AWS CDK, or CloudFormation.
- Develop and maintain CI/CD pipelines using tools such as Jenkins, GitHub Actions, or GitLab CI.
- Implement containerization and orchestration solutions using Docker, Kubernetes, ECS, or EKS.
- Establish monitoring, alerting, and observability frameworks using tools like Datadog, Prometheus, Grafana, ELK, or CloudWatch.
- Drive incident management, root cause analysis (RCA), and continuous improvement of system reliability.
- Design and implement disaster recovery (DR) and high-availability (HA) strategies.
- Optimize cloud costs using FinOps practices and cost monitoring tools.
- Collaborate with cross-functional teams to improve system performance, scalability, and security.
- Automate infrastructure, deployments, and operational workflows.
- Implement security best practices, including IAM, networking, and compliance standards.
- Lead platform-wide automation and reliability initiatives.
Required Skills & Qualifications
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 7+ years of experience in SRE, DevOps, or Cloud Infrastructure roles.
- Minimum 2+ years of experience in a leadership, mentoring, or team management role.
- Strong hands-on experience with AWS services (EC2, S3, RDS, IAM, VPC, Lambda).
- Expertise in Infrastructure-as-Code (Terraform, AWS CDK, or CloudFormation).
- Experience with CI/CD tools (Jenkins, GitHub Actions, GitLab CI).
- Proficiency in containerization and orchestration (Docker, Kubernetes, ECS/EKS).
- Strong experience with monitoring and observability tools (Datadog, New Relic, Prometheus, Grafana, ELK, CloudWatch).
- Solid scripting/programming skills (Python, Bash, or Go).
- Strong understanding of networking, cloud security, and identity/access management.
- Experience in designing high-availability and disaster recovery systems.
Preferred Qualifications
- AWS or equivalent cloud certifications (Solutions Architect, DevOps Engineer).
- Experience with AIOps, serverless architectures, and event-driven systems.
- Familiarity with FinOps and cloud cost optimization frameworks.
- Experience with SaaS monitoring tools (Datadog, New Relic, Sumo Logic, PagerDuty).
- Exposure to Atlassian tools (Jira, Confluence, Bitbucket).
- Experience working with SQL and NoSQL databases.
- Proven experience leading cross-functional reliability or automation initiatives.
Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free