Resume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score
Job Description
Position: Innovative Site Reliability Engineer for AI Compute Infrastructure
Location: Conklin
Step into a vital role as a Site Reliability Engineer to enhance AI inference deployments. Drive operational excellence across rapidly growing datacenter environments while utilizing your advanced technical skills.
This role involves hands-on management of AI clusters and ensuring reliable software deployment strategies within high-capacity infrastructures. You will tackle challenges in telemetry, observability, and the development of automated deployment pipelines, allowing for seamless capacity reallocation and maximized performance across systems. Your contributions will be integral in maintaining and growing our leading-edge technology.
Key Responsibilities:
- Operate across diverse datacenter environments experiencing rapid growth
- Ensure reliability of AI inference deployments at scale
- Develop solutions for telemetry and observability
- Advance deployment automation for efficient operations
- Collaborate on translating requirements with internal teams
Requirements
- 2-5 years in high-performance compute operations
- Strong automation skills with Python
- Experience with Linux systems and command-line tools
- Knowledge of Docker and Kubernetes
- Familiarity with Prometheus and Grafana for observability
Leverage your expertise in AI compute infrastructure to make significant impacts in deployment efficiency and operational reliability in an innovative environment.
#J-18808-Ljbffr
Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free