Skip to main content
Jobs Ai logo

Site Reliability Engineer (Remote)

Jobs Ai
Full Timemid
CAPosted April 27, 2026

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

KubernetesAnsibleJenkinsLinuxAgile

Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score

Job Description

  • Role

: Site Reliability Engineer (Remote)

  • Location

: Remote (Work from Anywhere)

  • Payout

: Competitive

  • Industry

: Technology, Artificial Intelligence, Data & Analytics

  • Job Function

: Engineering, Information Technology, Research

Role Overview:

One of our clients, a global leader in the technology industry, is seeking a skilled Site Reliability Engineer to play a pivotal role in ensuring the performance, reliability, and scalability of mission-critical infrastructure. This is a contractor position that offers the opportunity to work remotely and leverage expertise in Linux, Kubernetes, and Prometheus to architect, monitor, and enhance robust systems. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining scalable infrastructure to support innovative applications.

Key Responsibilities:

  • Design, implement, and maintain scalable infrastructure using Linux, Kubernetes, and Prometheus to support mission-critical applications
  • Monitor system health, analyze performance metrics, and proactively address bottlenecks or potential failures to ensure high system reliability
  • Automate operational processes to minimize manual intervention and increase system reliability, using tools such as scripting and automation frameworks
  • Respond swiftly to incidents, conduct root cause analysis, and drive continuous improvements in incident response procedures to ensure high system availability
  • Collaborate closely with development and operations teams to deliver seamless deployments and high system reliability, using agile methodologies and collaboration tools

Required Skills & Qualifications:

  • Deep expertise in Linux, Kubernetes, and Prometheus, with experience in designing and implementing scalable infrastructure
  • Strong understanding of system performance metrics, monitoring, and analysis, with experience in using tools such as Grafana and Prometheus
  • Experience in automating operational processes using scripting and automation frameworks, such as Ansible and Jenkins
  • Strong problem-solving skills, with experience in root cause analysis and incident response
  • Excellent collaboration and communication skills, with experience in working with development and operations teams

More About the Opportunity:

This role offers the opportunity to work with a global leader in the technology industry, contributing to the development of innovative applications and systems. The successful candidate will have the chance to leverage their expertise in Linux, Kubernetes, and Prometheus to make a significant impact on the performance, reliability, and scalability of mission-critical infrastructure.

Equal Opportunity Employer:

We hire based on skills and expertise. All qualified candidates are welcome regardless of background, experience, or prior employment history. Applications are reviewed solely on demonstrated technical ability and qualifications.

Apply Now!

Want AI-powered job matching?

Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.

Get Started Free