Lead Principal Site Reliability Engineer for Cloud Infrastructure

Parallel Domain

Full Timelead

Bella Coola, British Columbia, CAPosted 7 weeks ago

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonBashAWSKubernetesTerraformDevOps

Job Description

Drive the reliability and performance of cloud systems as a Principal Site Reliability Engineer. Elevate AWS infrastructure for demanding workloads in autonomous vehicle development while collaborating closely with engineering teams.

This high-ownership role emphasizes your expertise in AWS and Kubernetes, where you will oversee EKS operations, support deployment, and manage cloud security. You'll play a vital part in incident response, enhancing our monitoring capabilities and implementing best practices across cloud environments while ensuring high availability for enterprise customers.

Key Responsibilities:

Own AWS infrastructure and improve performance
Manage EKS cluster operations for production
Support GitOps deployment and infrastructure-as-code
Design automated remediation systems to reduce MTTR
Lead security governance and IAM management

Requirements

5+ years in SRE, DevOps, or infrastructure roles
Proficiency with Terraform and multi-environment patterns
Deep experience with AWS services and Kubernetes
Solid networking expertise in cloud environments
Comfort with Python and Bash scripting

Leverage your expertise to shape a reliable, secure cloud infrastructure that supports critical simulation workloads in an advanced technology domain.

#J-18808-Ljbffr