Site Reliability Engineer (SRE) – AI & Incident Management

Praxis HR Solution

Full Timemid Hybrid

Gurugram, Haryana, INPosted March 12, 2026

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonBashAWSGCPAzureDockerKubernetesTerraformAnsibleLinuxCI/CDDevOps

Job Description

Job Title

Site Reliability Engineer (SRE) – AI & Incident Management

Location

Pune | Gurugram | Noida (Hybrid / On-site)

Employment Type

Full-Time

Notice Period

Immediate Joiners to 30 Days

Job Summary

We are looking for a highly motivated Site Reliability Engineer (SRE) with strong expertise in AI-driven systems and Incident Management. The ideal candidate will be responsible for ensuring reliability, scalability, and performance of critical production systems. This role requires hands-on experience in automation, monitoring, and incident response to maintain high system availability.

Key Responsibilities

Ensure high availability, reliability, and performance of production systems.
Monitor infrastructure and applications to detect and resolve issues proactively.
Manage incident response, troubleshooting, and root cause analysis (RCA).
Implement automation to improve operational efficiency and reduce manual efforts.
Work closely with development teams to improve system reliability and deployment processes.
Utilize AI/ML tools or AI-enabled platforms to enhance monitoring and incident prediction.
Maintain SLA, SLO, and SLI metrics for system reliability.
Build and maintain observability solutions (logging, metrics, tracing).
Participate in on-call rotations and handle production incidents.