Job Description
Join as a Site Reliability Engineer to enhance service reliability and operational efficiency. Work with a remote team committed to innovative solutions and continuous improvements in the cloud environment. As part of this role, you will refine system design and directly impact operational success. From managing incident responses to leading post-mortems, your work will ensure scalable and resilient infrastructures. You'll play an essential part in maintaining service availability and improving system observability. Key Responsibilities:
- Enhance infrastructure performance and scalability
- Manage incidents and automate manual practices
- Respond to alerts with on-call support
- Define SLIs, SLOs, and error budgets
- Improve monitoring, alerting, and documentation Requirements:
- Proven experience with AWS or similar platforms
- Skilled in chaos engineering techniques
- Ability to debug live systems effectively
- Experience with programming and scripting languages
- Proactive mindset in a dynamic work environment Drive innovation and excellence, ensuring operational readiness in a growing system.
Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free