Principal Site Reliability Engineer (SRE)

INPosted 9 weeks ago

Role Overview

Trust3 AI is hiring a Principal Site Reliability Engineer (SRE). This is a full-time role in IN. Part of Trust3 AI's Lifecycle hiring. Full responsibilities, required qualifications, and the apply link are listed in the description below.

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonBashAWSGCPAzureKubernetesTerraformGitHub Actions

Job Description

Role Overview:

As a Principal Site Reliability Engineer (SRE) focusing on Data Platforms, you will be responsible for owning the reliability, support, and operations of enterprise data platforms such as Trust3 AI, Snowflake, and Databricks, with a primary emphasis on Google Cloud Platform (GCP). This role is deeply hands-on and requires a combination of managed services ownership, advanced production engineering, and reliability at scale.

Key Responsibilities:

Own the end-to-end platform lifecycle and delivery of managed services, including installation, operations, upgrades, optimization, and ensuring continuous platform health
Take complete ownership of critical production incidents by conducting deep debugging, Root Cause Analysis (RCA), and implementing permanent fixes
Troubleshoot complex, cross-system issues across GCP (GKE, IAM, networking), data platforms, and connectors
Lead performance tuning, scalability optimization, and system hardening for high-throughput systems
Design and implement automation across deployments, monitoring, and operations
Manage secrets and secure integrations using Vault (or similar) within the platform and CI/CD workflows
Install, upgrade, and operate Trust3 AI on GCP (GKE) across multi-region environments
Ensure accurate and reliable enforcement of data access policies
Build and enhance observability (metrics, logs, alerts) for proactive issue detection
Eliminate operational toil through continuous reliability improvements
Own issues end-to-end with strong stakeholder communication and adherence to SLAs
Collaborate with Engineering and Product teams to resolve issues and influence platform improvements
Lead managed services operations including monitoring, incident prevention, capacity planning, DR readiness, and ensuring service-level outcomes (SLA, uptime, upgrade timelines)

Qualifications Required:

Cloud expertise in GCP (GKE, IAM, BigQuery, GCS, VPC, Cloud Monitoring/Logging); exposure to AWS/Azure is a plus
Familiarity with data platforms such as Snowflake, Databricks, and BigQuery
Experience with infrastructure & CI/CD tools like Kubernetes, Helm, CI/CD (GitHub Actions, GitLab CI, or similar), and Terraform (preferred)
Proficiency in scripting languages like Python and Bash
Knowledge of observability tools like Prometheus, Grafana, and ELK
Understanding of security concepts including IAM, RBAC/ABAC, data governance (Trust3 AI/Ranger preferred), and secrets management (Vault or similar)

Additional Company Details:

This role is production-oriented, focusing on end-to-end incident ownership, deep technical problem-solving, and managed services operations. It does not primarily involve DevOps/build-only tasks or people management responsibilities. Role Overview:

Key Responsibilities:

Own the end-to-end platform lifecycle and delivery of managed services, including installation, operations, upgrades, optimization, and ensuring continuous platform health
Take complete ownership of critical production incidents by conducting deep debugging, Root Cause Analysis (RCA), and implementing permanent fixes
Troubleshoot complex, cross-system issues across GCP (GKE, IAM, networking), data platforms, and connectors
Lead performance tuning, scalability optimization, and system hardening for high-throughput systems
Design and implement automation across deployments, monitoring, and operations
Manage secrets and secure integrations using Vault (or similar) within the platform and CI/CD workflows
Install, upgrade, and operate Trust3 AI on GCP (GKE) across multi-region environments
Ensure accurate and reliable enforcement of data access policies
Build and enhance observability (metrics, logs, alerts) for proactive issue detection
Eliminate operational toil through continuous reliability improvements
Own issues end-to-end with strong stakeholder communication and adherence to SLAs
Collaborate with Engineering and Product teams to resolve issues and influence platform improvements
Lead managed services operations including monitoring, incident prevention, capacity planning, DR readiness, and ensuring service-level outcomes (SLA, uptime, upgrade timelines)

Qualifications Required:

Cloud expertise in GCP (GKE, IAM, BigQuery, GCS, VPC, Cloud Monitoring/Logging); exposure to AWS/Azure is a plus
Familiarity with data platforms such as Snowflake, Databricks, and BigQuery
Experience with infrastructure & CI/CD tools like Kubernetes, Helm, CI/CD (GitHub Actions, GitLab CI, or similar), and Terraform (pre

Frequently Asked Questions

How do I apply for the Principal Site Reliability Engineer (SRE) position at Trust3 AI?

Use the Apply button above to submit your application directly to Trust3 AI. Most applications take less than 5 minutes if your resume and contact details are ready, and you'll be routed to the employer's official application system to finish.

Where is the Principal Site Reliability Engineer (SRE) position at Trust3 AI located?

This position is based in IN. Trust3 AI has not indicated remote or hybrid options for this role, so candidates should plan for on-site work.

What does a Principal Site Reliability Engineer (SRE) at Trust3 AI earn?

Trust3 AI has not disclosed a salary range in this posting. Many employers share specifics later in the interview process; you can also ask during a recruiter screen if compensation transparency is important to you.

When was the Principal Site Reliability Engineer (SRE) role at Trust3 AI posted?

This role was posted on April 15, 2026 (63 days ago). It's still listed as actively hiring; we re-confirm openings against the source system multiple times per day and remove closed roles.

AI-powered job search

Get every job scored to your resume

Upload your resume and get jobs ranked, your resume tailored, and employee contacts found automatically.

Get Started Free

No credit card to start

Principal Site Reliability Engineer (SRE)

Trust3 AI

Full Timeprincipal

INPosted 9 weeks ago

Role Overview

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonBashAWSGCPAzureKubernetesTerraformGitHub Actions

Job Description

Role Overview:

Key Responsibilities:

Own the end-to-end platform lifecycle and delivery of managed services, including installation, operations, upgrades, optimization, and ensuring continuous platform health
Take complete ownership of critical production incidents by conducting deep debugging, Root Cause Analysis (RCA), and implementing permanent fixes
Troubleshoot complex, cross-system issues across GCP (GKE, IAM, networking), data platforms, and connectors
Lead performance tuning, scalability optimization, and system hardening for high-throughput systems
Design and implement automation across deployments, monitoring, and operations
Manage secrets and secure integrations using Vault (or similar) within the platform and CI/CD workflows
Install, upgrade, and operate Trust3 AI on GCP (GKE) across multi-region environments
Ensure accurate and reliable enforcement of data access policies
Build and enhance observability (metrics, logs, alerts) for proactive issue detection
Eliminate operational toil through continuous reliability improvements
Own issues end-to-end with strong stakeholder communication and adherence to SLAs
Collaborate with Engineering and Product teams to resolve issues and influence platform improvements
Lead managed services operations including monitoring, incident prevention, capacity planning, DR readiness, and ensuring service-level outcomes (SLA, uptime, upgrade timelines)

Qualifications Required:

Cloud expertise in GCP (GKE, IAM, BigQuery, GCS, VPC, Cloud Monitoring/Logging); exposure to AWS/Azure is a plus
Familiarity with data platforms such as Snowflake, Databricks, and BigQuery
Experience with infrastructure & CI/CD tools like Kubernetes, Helm, CI/CD (GitHub Actions, GitLab CI, or similar), and Terraform (preferred)
Proficiency in scripting languages like Python and Bash
Knowledge of observability tools like Prometheus, Grafana, and ELK
Understanding of security concepts including IAM, RBAC/ABAC, data governance (Trust3 AI/Ranger preferred), and secrets management (Vault or similar)

Additional Company Details:

Key Responsibilities:

Own the end-to-end platform lifecycle and delivery of managed services, including installation, operations, upgrades, optimization, and ensuring continuous platform health
Take complete ownership of critical production incidents by conducting deep debugging, Root Cause Analysis (RCA), and implementing permanent fixes
Troubleshoot complex, cross-system issues across GCP (GKE, IAM, networking), data platforms, and connectors
Lead performance tuning, scalability optimization, and system hardening for high-throughput systems
Design and implement automation across deployments, monitoring, and operations
Manage secrets and secure integrations using Vault (or similar) within the platform and CI/CD workflows
Install, upgrade, and operate Trust3 AI on GCP (GKE) across multi-region environments
Ensure accurate and reliable enforcement of data access policies
Build and enhance observability (metrics, logs, alerts) for proactive issue detection
Eliminate operational toil through continuous reliability improvements
Own issues end-to-end with strong stakeholder communication and adherence to SLAs
Collaborate with Engineering and Product teams to resolve issues and influence platform improvements
Lead managed services operations including monitoring, incident prevention, capacity planning, DR readiness, and ensuring service-level outcomes (SLA, uptime, upgrade timelines)

Qualifications Required:

Cloud expertise in GCP (GKE, IAM, BigQuery, GCS, VPC, Cloud Monitoring/Logging); exposure to AWS/Azure is a plus
Familiarity with data platforms such as Snowflake, Databricks, and BigQuery
Experience with infrastructure & CI/CD tools like Kubernetes, Helm, CI/CD (GitHub Actions, GitLab CI, or similar), and Terraform (pre

Frequently Asked Questions

How do I apply for the Principal Site Reliability Engineer (SRE) position at Trust3 AI?

Where is the Principal Site Reliability Engineer (SRE) position at Trust3 AI located?

This position is based in IN. Trust3 AI has not indicated remote or hybrid options for this role, so candidates should plan for on-site work.

What does a Principal Site Reliability Engineer (SRE) at Trust3 AI earn?

When was the Principal Site Reliability Engineer (SRE) role at Trust3 AI posted?

This role was posted on April 15, 2026 (63 days ago). It's still listed as actively hiring; we re-confirm openings against the source system multiple times per day and remove closed roles.