Skip to main content
FTD India Private Limited logo

Sr Site Reliability Engineer

FTD India Private Limited
Full TimeseniorHybrid
Telangana, INPosted March 11, 2026

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonJavaShellNode.jsDockerKubernetesTerraformJenkinsGitHub ActionsLinuxGitHubCI/CDDevOpsMicroservices

Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score

Job Description

The Senior Site Reliability Engineer (SRE) will enable FTD to efficiently deliver and operate high quality, secure software at scale. As a senior contributor you will collaborate with cross-functional teams to promote DevOps principles and practices and implement world class self-service. You will incubate and proliferate SRE principles and practices to ensure the stability and reliability of our commerce platforms. You will take a lead role in managing the health of applications and infrastructure.

This position supports a hybrid work model including onsite presence in our Hyderabad, India office as needed. Occasional on-call and overtime work will be required but generally it is not expected to be significant.

KEY RESPONSIBILITIES:

  • Maintain availability, performance, and scalability of critical services and production environments.
  • Collaborate closely with developers to design reliable applications and improve deployment practices (you will be embedded in the Development team but reporting to infrastructure leader).
  • Break down walls and build trust between developers and infrastructure teams
  • Participate in end-to-end application ownership throughout the CI/CD process, including automated testing, observability, dependency management, and other operational concerns.
  • Improve CI/CD pipeline reliability, traceability, and security
  • Build automation for provisioning, configuration, deployments, and incident response.
  • Improve observability using metrics, logs, distributed tracing, dashboards, and alerting.
  • Participate in on-call rotation, lead incident response, and drive root cause analysis.
  • Conduct capacity planning, chaos testing, and reliability reviews.
  • Implement infrastructure-as-code using Terraform, Helm, Jenkins/GitHub Actions/etc.
  • Optimize CI/CD pipelines and ensure safe, repeatable deployments (i.e. ArgoCD).
  • Champion SRE principles: SLIs/SLOs, error budgets, toil reduction, problem management, blameless postmortems.
  • Embrace a culture of enablement, customer service, continuous improvement, transparency, and fiscal responsibility
  • Perform other duties as directed
  • KNOWLEDGE, SKILLS AND ABILITIES
  • 5+ years designing, developing, delivering, and operating scalable, available, high-performance applications (Java and node.js, etc.) and infrastructure
  • Bachelor's or advanced degree in Computer Science, Information Systems, or a related field
  • Familiarity with modern application languages and concepts, with hands-on e-commerce software development experience preferred
  • Google Professional Cloud Architect or similar certification desired
  • Advanced hands-on experience with continuous integration and delivery / deployment methodologies and technologies
  • Advanced experience with computer, networking, security, storage, monitoring, logging, database, and other technologies in Google Cloud Platform or similar major cloud environment
  • Strong experience with containerization (e.g. Docker), Kubernetes, and Infrastructure as Code (Terraform preferred)
  • Working knowledge of Helm and Service Mesh (e.g. Istio)
  • Proficient understanding of microservices principles and orchestration
  • Excellence in navigating and prioritizing multiple simultaneous responsibilities of varying scope and complexity
  • Ability to effectively articulate technical concepts to audiences at all organizational levels via oral, written, and other non-verbal communications
  • Demonstrated desire and ability to be self-directed, take ownership of issues, and establish a prominent level of credibility
  • Ability to work well independently and within dynamic, cross-functional teams
  • Excellent understanding of Internet concepts, technologies and protocols (TCP/IP, DNS, HTTP, TLS / SSL, etc.)
  • Experience with rapid detection and resolution of technical issues using various monitoring and application performance management tools
  • Proficiency with shell scripting, Python and/or other scripting languages in a Linux environment
  • Ability to operate effectively under pressure, both independently and in collaboration with other resources
  • Ability to rapidly learn new technologies via mentoring, formal training, independent research and testing
  • A genuine desire and willingness to share knowledge effectively with others

Want AI-powered job matching?

Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.

Get Started Free