Skip to main content
BIG IT JOBS logo

Principal Site Reliability Engineer (sre)

BIG IT JOBS
Full Timeprincipal
Maharashtra, INPosted April 17, 2026

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonBashAWSGCPAzureKubernetesTerraformGitHub ActionsSnowflakeBigQueryGitHubGitLabCI/CDDevOps

Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score

Job Description

Principal Site Reliability Engineer (sre)

Privacera

3 hours ago

Expires On17 May 2026

Pune City, Maharashtra, India

Apply now

Job description & requirements

Role: Principal Site Reliability Engineer (SRE) – Data Platforms

Role Summary

Own reliability, support, and operations of enterprise data platforms (Trust3 AI, Snowflake, Databricks)

with a primary focus on Google Cloud Platform (GCP). This is a deeply hands-on Principal SRE role

combining managed services ownership, advanced production engineering, and reliability at scale.

What You’ll Do

  • Own end-to-end platform lifecycle and managed services delivery: installation, operations,

upgrades, optimization, and continuous platform health

  • Take full ownership of critical production incidents with deep debugging, RCA, and permanent fixes
  • Troubleshoot complex, cross-system issues across GCP (GKE, IAM, networking), data platforms, and connectors
  • Lead performance tuning, scalability optimization, and system hardening for high-throughput systems
  • Design and implement automation across deployments, monitoring, and operations
  • Manage secrets and secure integrations using Vault (or similar) within platform and CI/CD workflows
  • Install, upgrade, and operate Trust3 AI on GCP (GKE) across multi-region environments
  • Ensure accurate and reliable enforcement of data access policies
  • Build and enhance observability (metrics, logs, alerts) for proactive issue detection
  • Eliminate operational toil through continuous reliability improvements
  • Own issues end-to-end with strong stakeholder communication and SLA adherence
  • Collaborate with Engineering and Product to resolve issues and influence platform improvements
  • Lead managed services operations including monitoring, incident prevention, capacity planning,

DR readiness, and service-level outcomes (SLA, uptime, upgrade timelines)

Skills Required

  • Cloud: Strong expertise in GCP (GKE, IAM, BigQuery, GCS, VPC, Cloud Monitoring/Logging); AWS/Azure exposure is a plus
  • Data Platforms: Snowflake, Databricks, BigQuery
  • Infra & CI/CD: Kubernetes, Helm, CI/CD (GitHub Actions, GitLab CI, or similar), Terraform (preferred)
  • Scripting: Python / Bash
  • Observability: Prometheus, Grafana, ELK
  • Security: IAM, RBAC/ABAC, data governance (Trust3 AI/Ranger preferred), secrets management (Vault or similar)

Experience

  • 10+ years in SRE / DevOps / Production Engineering
  • Strong expertise in debugging distributed systems and complex production environments
  • Proven ownership of high-severity incidents and large-scale production systems
  • Demonstrated ability to independently solve ambiguous, high-impact technical problems
  • Track record of driving reliability, automation, and operational excellence at scale
  • Experience running high-throughput, always-on (24x7) systems with large data volumes and strict uptime SLAs

Why This Role

  • Principal-level, deeply hands-on IC role (no people management)
  • End-to-end ownership of mission-critical data platforms
  • Work on complex production challenges across cloud, data, and security layers
  • High impact on enterprise data access, governance, and reliability

Important Note

This is a production-first role involving end-to-end incident ownership, deep technical problem solving,

and managed services operations — not a pure DevOps/build-only or people management role.

Location :

Pune City, Maharashtra, India

Want AI-powered job matching?

Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.

Get Started Free