Site Reliability Engineering (SRE)
Viraaj HR Solutions Private LimitedResume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score
Job Description
You will be joining a fast-moving operations and platform team responsible for ensuring availability, scalability, and observability of critical production systems by operating and optimizing production Kubernetes clusters and underlying Linux-based infrastructure. Your key responsibilities will include:
- Implementing Infrastructure as Code (IaC) using Terraform and managing cloud resources across AWS for secure, repeatable deployments.
- Building and maintaining monitoring, logging, and alerting stacks driven by SLO/SLI, including implementing Prometheus metrics, dashboarding, and automated alerts with clear runbooks.
- Designing and owning CI/CD pipelines to automate build, test, and release workflows, reducing manual deployments and rollout risk.
- Leading incident response and postmortems, triaging outages, identifying root causes, and implementing remediation and reliability improvements.
- Developing automation tools and scripts to eliminate operational toil, support capacity planning, and collaborating with development teams on performance and reliability engineering.
Qualifications Required:
- Must-Have: Kubernetes, Linux, Terraform, AWS, Prometheus, CI/CD
- Preferred: Grafana, Python, ELK Stack
The company offers hands-on ownership of production systems with clear career growth opportunities into platform and reliability leadership roles. You will work in a collaborative engineering culture that prioritizes automation, observability, and measurable SLAs. This on-site role provides close collaboration with cross-functional product and infrastructure teams.
If you have the required skills and are open to on-site roles in India, we encourage you to apply for this position. This role is ideal for engineers who enjoy automating infrastructure, improving service reliability, and driving operational excellence. You will be joining a fast-moving operations and platform team responsible for ensuring availability, scalability, and observability of critical production systems by operating and optimizing production Kubernetes clusters and underlying Linux-based infrastructure. Your key responsibilities will include:
- Implementing Infrastructure as Code (IaC) using Terraform and managing cloud resources across AWS for secure, repeatable deployments.
- Building and maintaining monitoring, logging, and alerting stacks driven by SLO/SLI, including implementing Prometheus metrics, dashboarding, and automated alerts with clear runbooks.
- Designing and owning CI/CD pipelines to automate build, test, and release workflows, reducing manual deployments and rollout risk.
- Leading incident response and postmortems, triaging outages, identifying root causes, and implementing remediation and reliability improvements.
- Developing automation tools and scripts to eliminate operational toil, support capacity planning, and collaborating with development teams on performance and reliability engineering.
Qualifications Required:
- Must-Have: Kubernetes, Linux, Terraform, AWS, Prometheus, CI/CD
- Preferred: Grafana, Python, ELK Stack
The company offers hands-on ownership of production systems with clear career growth opportunities into platform and reliability leadership roles. You will work in a collaborative engineering culture that prioritizes automation, observability, and measurable SLAs. This on-site role provides close collaboration with cross-functional product and infrastructure teams.
If you have the required skills and are open to on-site roles in India, we encourage you to apply for this position. This role is ideal for engineers who enjoy automating infrastructure, improving service reliability, and driving operational excellence.
Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free