Platform / DevOps / Site Reliability Engineer (SRE)
AABM Cloud Data SolutionsResume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score
Job Description
Location: Hyderabad, India / Remote
Shift: Night Shift (Standard US Hours)
Experience Level: 7+ years
Employment Type: Full-time
Role Overview
We are looking for a highly skilled Lead Platform/DevOps Engineer to manage, automate, and scale our cloud-native infrastructure. You will be responsible for the entire Software Development Life Cycle (SDLC), focusing on infrastructure-as-code, container orchestration, and GitOps methodologies to ensure high availability and operational excellence.
Key Responsibilities
1. Cloud & Infrastructure Management
- Design and Deploy: Architect and manage robust AWS environments utilizing EC2, VPC, IAM, S3, RDS, and Lambda.
- Infrastructure as Code (IaC): Develop and maintain complex Terraform templates for development, staging, production, and disaster recovery environments.
- Security & Compliance: Implement security best practices, including IAM key rotation, AWS Secrets Manager, encryption using KMS, and Service Control Policies (SCPs).
2. Orchestration & Containerization
- Kubernetes Management: Design and manage large-scale Kubernetes clusters (EKS/AKS) supporting 100+ microservices.
- Application Packaging: Implement Helm charts for application packaging and environment-specific configurations.
- Service Mesh: Implement and manage Istio for secure microservices communication and traffic management.
3. CI/CD & Automation
- Pipeline Development: Build end-to-end CI/CD automation using Jenkins, GitHub Actions, and Azure DevOps with YAML-based pipelines.
- GitOps Implementation: Integrate Argo CD for continuous deployment to ensure consistent and reliable rollouts across clusters.
- Developer Experience: Configure developer portals (e.g., Backstage) for self-service deployments and automated provisioning.
4. Site Reliability & Observability
- Monitoring & Alerting: Set up Prometheus, Grafana, and ELK stack (Elasticsearch, Logstash, Kibana) for system-wide observability.
- Performance Engineering: Conduct performance testing using JMeter to identify bottlenecks and optimize application speed.
- Cost Management: Execute resource optimization strategies and utilize tools like Kubecost to reduce cloud billing (e.g., via Spot instances)
Job Type: Full-time
Pay: ₹1,500,000.00 - ₹2,500,000.00 per year
Experience
- AWS : 8 years (Required)
- Docker, Kubernetes (EKS/AKS/ROSA): 8 years (Required)
- CI/CD Tools: 8 years (Required)
Shift availability:
- Night Shift (Required)
- Overnight Shift (Required)
Work Location: Remote
Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free