Sr Site Reliability Engineer

FTD India Private Limited

Full Timesenior Hybrid

Telangana, INPosted March 11, 2026

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonJavaShellNode.jsDockerKubernetesTerraformJenkinsGitHub ActionsLinuxGitHubCI/CDDevOpsMicroservices

Job Description

The Senior Site Reliability Engineer (SRE) will enable FTD to efficiently deliver and operate high quality, secure software at scale. As a senior contributor you will collaborate with cross-functional teams to promote DevOps principles and practices and implement world class self-service. You will incubate and proliferate SRE principles and practices to ensure the stability and reliability of our commerce platforms. You will take a lead role in managing the health of applications and infrastructure.

This position supports a hybrid work model including onsite presence in our Hyderabad, India office as needed. Occasional on-call and overtime work will be required but generally it is not expected to be significant.

KEY RESPONSIBILITIES:

Maintain availability, performance, and scalability of critical services and production environments.
Collaborate closely with developers to design reliable applications and improve deployment practices (you will be embedded in the Development team but reporting to infrastructure leader).
Break down walls and build trust between developers and infrastructure teams
Participate in end-to-end application ownership throughout the CI/CD process, including automated testing, observability, dependency management, and other operational concerns.
Improve CI/CD pipeline reliability, traceability, and security
Build automation for provisioning, configuration, deployments, and incident response.
Improve observability using metrics, logs, distributed tracing, dashboards, and alerting.
Participate in on-call rotation, lead incident response, and drive root cause analysis.
Conduct capacity planning, chaos testing, and reliability reviews.
Implement infrastructure-as-code using Terraform, Helm, Jenkins/GitHub Actions/etc.
Optimize CI/CD pipelines and ensure safe, repeatable deployments (i.e. ArgoCD).
Champion SRE principles: SLIs/SLOs, error budgets, toil reduction, problem management, blameless postmortems.
Embrace a culture of enablement, customer service, continuous improvement, transparency, and fiscal responsibility
Perform other duties as directed
KNOWLEDGE, SKILLS AND ABILITIES
5+ years designing, developing, delivering, and operating scalable, available, high-performance applications (Java and node.js, etc.) and infrastructure
Bachelor's or advanced degree in Computer Science, Information Systems, or a related field
Familiarity with modern application languages and concepts, with hands-on e-commerce software development experience preferred
Google Professional Cloud Architect or similar certification desired
Advanced hands-on experience with continuous integration and delivery / deployment methodologies and technologies
Advanced experience with computer, networking, security, storage, monitoring, logging, database, and other technologies in Google Cloud Platform or similar major cloud environment
Strong experience with containerization (e.g. Docker), Kubernetes, and Infrastructure as Code (Terraform preferred)
Working knowledge of Helm and Service Mesh (e.g. Istio)
Proficient understanding of microservices principles and orchestration
Excellence in navigating and prioritizing multiple simultaneous responsibilities of varying scope and complexity
Ability to effectively articulate technical concepts to audiences at all organizational levels via oral, written, and other non-verbal communications
Demonstrated desire and ability to be self-directed, take ownership of issues, and establish a prominent level of credibility
Ability to work well independently and within dynamic, cross-functional teams
Excellent understanding of Internet concepts, technologies and protocols (TCP/IP, DNS, HTTP, TLS / SSL, etc.)
Experience with rapid detection and resolution of technical issues using various monitoring and application performance management tools
Proficiency with shell scripting, Python and/or other scripting languages in a Linux environment
Ability to operate effectively under pressure, both independently and in collaboration with other resources
Ability to rapidly learn new technologies via mentoring, formal training, independent research and testing
A genuine desire and willingness to share knowledge effectively with others