Skip to main content
Arting Digital Private Limited logo

Site Reliability Engineering Lead

Arting Digital Private Limited
Full Timesenior
Tamil Nadu, INPosted March 18, 2026

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonGoBashSQLAWSDockerKubernetesTerraformAnsibleJenkinsGitHub ActionsGitHubGitLabJiraCI/CDDevOpsSaaS

Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score

Job Description

Posting title: Site Reliability Engineering Lead

Experience: 7+ Years

Location: Chennai

Work mode: On-site

Primary skills: Terraform, Ansible, AWS services, Docker, Kubernetes, CI/CD, Datadog/ CloudWatch/ Prometheus, Python/Bash, SRE/ DevOps

Qualification: B.Tech / B.E. in Computer Science or MCA / M.Tech

Role Overview

We are looking for an experienced Senior Site Reliability Engineer (SRE) / DevOps Lead to design, build, and maintain highly scalable, reliable, and secure cloud infrastructure.

Key Responsibilities

  • Design, implement, and manage scalable, secure, and highly available cloud infrastructure on AWS.
  • Lead and mentor a team of engineers, fostering best practices in SRE and DevOps.
  • Build and manage Infrastructure-as-Code (IaC) using tools like Terraform, AWS CDK, or CloudFormation.
  • Develop and maintain CI/CD pipelines using tools such as Jenkins, GitHub Actions, or GitLab CI.
  • Implement containerization and orchestration solutions using Docker, Kubernetes, ECS, or EKS.
  • Establish monitoring, alerting, and observability frameworks using tools like Datadog, Prometheus, Grafana, ELK, or CloudWatch.
  • Drive incident management, root cause analysis (RCA), and continuous improvement of system reliability.
  • Design and implement disaster recovery (DR) and high-availability (HA) strategies.
  • Optimize cloud costs using FinOps practices and cost monitoring tools.
  • Collaborate with cross-functional teams to improve system performance, scalability, and security.
  • Automate infrastructure, deployments, and operational workflows.
  • Implement security best practices, including IAM, networking, and compliance standards.
  • Lead platform-wide automation and reliability initiatives.

Required Skills & Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
  • 7+ years of experience in SRE, DevOps, or Cloud Infrastructure roles.
  • Minimum 2+ years of experience in a leadership, mentoring, or team management role.
  • Strong hands-on experience with AWS services (EC2, S3, RDS, IAM, VPC, Lambda).
  • Expertise in Infrastructure-as-Code (Terraform, AWS CDK, or CloudFormation).
  • Experience with CI/CD tools (Jenkins, GitHub Actions, GitLab CI).
  • Proficiency in containerization and orchestration (Docker, Kubernetes, ECS/EKS).
  • Strong experience with monitoring and observability tools (Datadog, New Relic, Prometheus, Grafana, ELK, CloudWatch).
  • Solid scripting/programming skills (Python, Bash, or Go).
  • Strong understanding of networking, cloud security, and identity/access management.
  • Experience in designing high-availability and disaster recovery systems.

Preferred Qualifications

  • AWS or equivalent cloud certifications (Solutions Architect, DevOps Engineer).
  • Experience with AIOps, serverless architectures, and event-driven systems.
  • Familiarity with FinOps and cloud cost optimization frameworks.
  • Experience with SaaS monitoring tools (Datadog, New Relic, Sumo Logic, PagerDuty).
  • Exposure to Atlassian tools (Jira, Confluence, Bitbucket).
  • Experience working with SQL and NoSQL databases.
  • Proven experience leading cross-functional reliability or automation initiatives.

Want AI-powered job matching?

Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.

Get Started Free