Site Reliability Engineer (SRE) – AI & Incident Management
Praxis HR SolutionResume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score
Job Description
Job Title
Site Reliability Engineer (SRE) – AI & Incident Management
Location
Pune | Gurugram | Noida (Hybrid / On-site)
Employment Type
Full-Time
Notice Period
Immediate Joiners to 30 Days
Job Summary
We are looking for a highly motivated Site Reliability Engineer (SRE) with strong expertise in AI-driven systems and Incident Management. The ideal candidate will be responsible for ensuring reliability, scalability, and performance of critical production systems. This role requires hands-on experience in automation, monitoring, and incident response to maintain high system availability.
Key Responsibilities
- Ensure high availability, reliability, and performance of production systems.
- Monitor infrastructure and applications to detect and resolve issues proactively.
- Manage incident response, troubleshooting, and root cause analysis (RCA).
- Implement automation to improve operational efficiency and reduce manual efforts.
- Work closely with development teams to improve system reliability and deployment processes.
- Utilize AI/ML tools or AI-enabled platforms to enhance monitoring and incident prediction.
- Maintain SLA, SLO, and SLI metrics for system reliability.
- Build and maintain observability solutions (logging, metrics, tracing).
- Participate in on-call rotations and handle production incidents.
Required Skills
- Strong experience in Site Reliability Engineering (SRE)
- Hands-on experience with Incident Management and Production Support
- Knowledge of AI tools / AI-driven automation / AI-based monitoring
- Experience with Cloud Platforms (AWS / Azure / GCP)
- Familiarity with Monitoring Tools (Prometheus, Grafana, Datadog, Splunk, etc.)
- Experience with Linux / scripting (Python, Bash)
- Knowledge of CI/CD pipelines and DevOps practices
- Understanding of containerization (Docker, Kubernetes)
Preferred Qualifications
- Experience with AIOps platforms
- Knowledge of Infrastructure as Code (Terraform / Ansible)
- Strong debugging and problem-solving skills
- Experience working in high-availability distributed systems
Why Join Us
- Opportunity to work on modern AI-driven infrastructure
- Exposure to large-scale production environments
- Collaborative and growth-focused work culture
How to Apply
Interested candidates with Immediate to 30 days notice period can apply via Indeed or share their updated resume.
Job Types: Full-time, Permanent
Pay: ₹1,200,000.00 per year
Benefits
- Cell phone reimbursement
- Food provided
- Health insurance
- Paid sick time
- Paid time off
- Provident Fund
- Work from home
Work Location: In person
Similar Jobs
Senior Dotnet Developer
Skysoft Inc.
Senior DevOps Eng (India)
Finastra
Sr Site Reliability Engineer
FTD India Private Limited
Devops/Site Reliability Engineer(SRE) (Cloud / Security) - (Fully Remote)
Salesforce Developer – Development, Salesforce Platform, Agile framework, APEX coding, Lightning Web
APPTOZA INC.
Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free