Site Reliability Engineer (Technology Support Engineer) (India)
Accenture India PrivateResume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score
Job Description
Role Overview:
As a Technology Support Engineer, you will play a crucial role in resolving incidents and problems across multiple business system components to ensure operational stability. Your responsibilities will include creating and implementing Requests for Change (RFC), updating knowledge base articles for effective troubleshooting, and collaborating with vendors and service management teams for issue analysis and resolution. Your expertise in Site Reliability Engineering will be essential in maintaining system stability, scalability, and high availability.
Key Responsibilities:
- Monitor and optimize system uptime, latency, and throughput to meet Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
- Lead incident response, manage escalations, perform root cause analysis (RCA), and conduct postmortem reviews.
- Develop CI/CD pipelines, automate infrastructure management, and eliminate manual tasks through scripting and orchestration.
- Implement monitoring and observability frameworks (Prometheus, Grafana, ELK, Datadog) for real-time visibility into distributed systems.
- Conduct resource forecasting, design scalable infrastructure, and handle performance under surge conditions.
- Collaborate with developers to ensure safe and reliable rollout of new features with automated testing and rollback mechanisms.
- Implement disaster recovery strategies, chaos tests, and failover automation for business continuity.
- Use post-incident analytics to refine operational practices and improve reliability with data-driven improvements.
- Collaborate with product, design, ML, and DevOps teams to build intelligent workflows and user experiences.
- Implement Infrastructure as Code (IaC) using tools like Terraform, CloudFormation, AZURE DEV OPS, or Pulumi.
- Demonstrate expertise in Cloud IaaS and PaaS services.
Qualifications Required:
- Educational Qualification: 15 years of full time education.
- Minimum of 7.5 years of experience in Site Reliability Engineering.
- Expertise in Python, Go, Bash, or JavaScript for automation and tooling.
- Hands-on experience with cloud environments such as AWS, Azure, GCP, and orchestration tools like Kubernetes and Terraform.
- Deep understanding of Linux systems, networking, and distributed architectures.
- Proficiency in observability solutions like Prometheus, Grafana, Datadog, CloudWatch, or New Relic.
- Familiarity with incident management and alerting platforms such as PagerDuty and xmatters.
- Proficiency in CI/CD frameworks like Jenkins, GitHub Actions, or GitLab CI.
- Working knowledge of security, compliance, and performance optimization for highly available systems.
Additional Company Details:
The company values certifications and encourages candidates to pursue certifications such as AWS Certified Solutions Architect Professional, Microsoft Certified: Azure Solutions Architect Expert, Google Professional Cloud Architect, Certified Kubernetes Administrator (CKA), HashiCorp Certified: Terraform Associate, and Certified DevOps Engineer certifications (AWS, Azure, or Google) to enhance their skills and expertise in the field. Role Overview:
As a Technology Support Engineer, you will play a crucial role in resolving incidents and problems across multiple business system components to ensure operational stability. Your responsibilities will include creating and implementing Requests for Change (RFC), updating knowledge base articles for effective troubleshooting, and collaborating with vendors and service management teams for issue analysis and resolution. Your expertise in Site Reliability Engineering will be essential in maintaining system stability, scalability, and high availability.
Key Responsibilities:
- Monitor and optimize system uptime, latency, and throughput to meet Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
- Lead incident response, manage escalations, perform root cause analysis (RCA), and conduct postmortem reviews.
- Develop CI/CD pipelines, automate infrastructure management, and eliminate manual tasks through scripting and orchestration.
- Implement monitoring and observability frameworks (Prometheus, Grafana, ELK, Datadog) for real-time visibility into distributed systems.
- Conduct resource forecasting, design scalable infrastructure, and handle performance under surge conditions.
- Collaborate with developers to ensure safe and reliable rollout of new features with automated testing and rollback mechanisms.
- Implement disaster recovery strategies, chaos tests, and failover automation for business continuity.
- Use post-incident analytics to refine operational practices and improve reliability with data-driven improvements.
- Collaborate with product, design, ML, and DevOps teams to build intelligent workflows and user experiences.
- Implement Infrastructure as Code (IaC) using tools like Terraform, CloudFormation, AZURE DEV OPS, or Pulumi.
- Demonstrate expertise in Cloud IaaS and PaaS services.
Qualifica
Similar Jobs
BI Developer- Husky (India) Chennai
Husky Technologies
Linux Systems Administrator
Bespoke Technologies, Inc
Software Engineer II - Python, PySpark, AWS
JPMorganChase
Data and Analytics Engineer
Lancesoft
Security Engineer
Robert Half
Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free