Sr Site Reliability Engineer
FTD India Private LimitedResume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score
Job Description
The Senior Site Reliability Engineer (SRE) will enable FTD to efficiently deliver and operate high quality, secure software at scale. As a senior contributor you will collaborate with cross-functional teams to promote DevOps principles and practices and implement world class self-service. You will incubate and proliferate SRE principles and practices to ensure the stability and reliability of our commerce platforms. You will take a lead role in managing the health of applications and infrastructure.
This position supports a hybrid work model including onsite presence in our Hyderabad, India office as needed. Occasional on-call and overtime work will be required but generally it is not expected to be significant.
KEY RESPONSIBILITIES:
- Maintain availability, performance, and scalability of critical services and production environments.
- Collaborate closely with developers to design reliable applications and improve deployment practices (you will be embedded in the Development team but reporting to infrastructure leader).
- Break down walls and build trust between developers and infrastructure teams
- Participate in end-to-end application ownership throughout the CI/CD process, including automated testing, observability, dependency management, and other operational concerns.
- Improve CI/CD pipeline reliability, traceability, and security
- Build automation for provisioning, configuration, deployments, and incident response.
- Improve observability using metrics, logs, distributed tracing, dashboards, and alerting.
- Participate in on-call rotation, lead incident response, and drive root cause analysis.
- Conduct capacity planning, chaos testing, and reliability reviews.
- Implement infrastructure-as-code using Terraform, Helm, Jenkins/GitHub Actions/etc.
- Optimize CI/CD pipelines and ensure safe, repeatable deployments (i.e. ArgoCD).
- Champion SRE principles: SLIs/SLOs, error budgets, toil reduction, problem management, blameless postmortems.
- Embrace a culture of enablement, customer service, continuous improvement, transparency, and fiscal responsibility
- Perform other duties as directed
- KNOWLEDGE, SKILLS AND ABILITIES
- 5+ years designing, developing, delivering, and operating scalable, available, high-performance applications (Java and node.js, etc.) and infrastructure
- Bachelor's or advanced degree in Computer Science, Information Systems, or a related field
- Familiarity with modern application languages and concepts, with hands-on e-commerce software development experience preferred
- Google Professional Cloud Architect or similar certification desired
- Advanced hands-on experience with continuous integration and delivery / deployment methodologies and technologies
- Advanced experience with computer, networking, security, storage, monitoring, logging, database, and other technologies in Google Cloud Platform or similar major cloud environment
- Strong experience with containerization (e.g. Docker), Kubernetes, and Infrastructure as Code (Terraform preferred)
- Working knowledge of Helm and Service Mesh (e.g. Istio)
- Proficient understanding of microservices principles and orchestration
- Excellence in navigating and prioritizing multiple simultaneous responsibilities of varying scope and complexity
- Ability to effectively articulate technical concepts to audiences at all organizational levels via oral, written, and other non-verbal communications
- Demonstrated desire and ability to be self-directed, take ownership of issues, and establish a prominent level of credibility
- Ability to work well independently and within dynamic, cross-functional teams
- Excellent understanding of Internet concepts, technologies and protocols (TCP/IP, DNS, HTTP, TLS / SSL, etc.)
- Experience with rapid detection and resolution of technical issues using various monitoring and application performance management tools
- Proficiency with shell scripting, Python and/or other scripting languages in a Linux environment
- Ability to operate effectively under pressure, both independently and in collaboration with other resources
- Ability to rapidly learn new technologies via mentoring, formal training, independent research and testing
- A genuine desire and willingness to share knowledge effectively with others
Similar Jobs
Senior Dotnet Developer
Skysoft Inc.
Senior DevOps Eng (India)
Finastra
Site Reliability Engineer (SRE) – AI & Incident Management
Praxis HR Solution
Devops/Site Reliability Engineer(SRE) (Cloud / Security) - (Fully Remote)
Salesforce Developer – Development, Salesforce Platform, Agile framework, APEX coding, Lightning Web
APPTOZA INC.
Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free