Principal Site Reliability Engineer (sre)

Maharashtra, INPosted 7 weeks ago

Role Overview

BIG IT JOBS is hiring a Principal Site Reliability Engineer (sre). This is a full-time role in Maharashtra. Part of BIG IT JOBS's Lifecycle hiring. Full responsibilities, required qualifications, and the apply link are listed in the description below.

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonBashAWSGCPAzureKubernetesTerraformGitHub Actions

Job Description

Principal Site Reliability Engineer (sre)

Privacera

3 hours ago

Expires On17 May 2026

Pune City, Maharashtra, India

Apply now

Job description & requirements

Role: Principal Site Reliability Engineer (SRE) – Data Platforms

Role Summary

Own reliability, support, and operations of enterprise data platforms (Trust3 AI, Snowflake, Databricks)

with a primary focus on Google Cloud Platform (GCP). This is a deeply hands-on Principal SRE role

combining managed services ownership, advanced production engineering, and reliability at scale.

What You’ll Do

Own end-to-end platform lifecycle and managed services delivery: installation, operations,

upgrades, optimization, and continuous platform health

Take full ownership of critical production incidents with deep debugging, RCA, and permanent fixes
Troubleshoot complex, cross-system issues across GCP (GKE, IAM, networking), data platforms, and connectors
Lead performance tuning, scalability optimization, and system hardening for high-throughput systems
Design and implement automation across deployments, monitoring, and operations
Manage secrets and secure integrations using Vault (or similar) within platform and CI/CD workflows
Install, upgrade, and operate Trust3 AI on GCP (GKE) across multi-region environments
Ensure accurate and reliable enforcement of data access policies
Build and enhance observability (metrics, logs, alerts) for proactive issue detection
Eliminate operational toil through continuous reliability improvements
Own issues end-to-end with strong stakeholder communication and SLA adherence
Collaborate with Engineering and Product to resolve issues and influence platform improvements
Lead managed services operations including monitoring, incident prevention, capacity planning,

DR readiness, and service-level outcomes (SLA, uptime, upgrade timelines)

Skills Required

Cloud: Strong expertise in GCP (GKE, IAM, BigQuery, GCS, VPC, Cloud Monitoring/Logging); AWS/Azure exposure is a plus
Data Platforms: Snowflake, Databricks, BigQuery
Infra & CI/CD: Kubernetes, Helm, CI/CD (GitHub Actions, GitLab CI, or similar), Terraform (preferred)
Scripting: Python / Bash
Observability: Prometheus, Grafana, ELK
Security: IAM, RBAC/ABAC, data governance (Trust3 AI/Ranger preferred), secrets management (Vault or similar)

Experience

10+ years in SRE / DevOps / Production Engineering
Strong expertise in debugging distributed systems and complex production environments
Proven ownership of high-severity incidents and large-scale production systems
Demonstrated ability to independently solve ambiguous, high-impact technical problems
Track record of driving reliability, automation, and operational excellence at scale
Experience running high-throughput, always-on (24x7) systems with large data volumes and strict uptime SLAs

Why This Role

Principal-level, deeply hands-on IC role (no people management)
End-to-end ownership of mission-critical data platforms
Work on complex production challenges across cloud, data, and security layers
High impact on enterprise data access, governance, and reliability

Important Note

This is a production-first role involving end-to-end incident ownership, deep technical problem solving,

and managed services operations — not a pure DevOps/build-only or people management role.

Location :

Pune City, Maharashtra, India

Frequently Asked Questions

How do I apply for the Principal Site Reliability Engineer (sre) position at BIG IT JOBS?

Use the Apply button above to submit your application directly to BIG IT JOBS. Most applications take less than 5 minutes if your resume and contact details are ready, and you'll be routed to the employer's official application system to finish.

Where is the Principal Site Reliability Engineer (sre) position at BIG IT JOBS located?

This position is based in Maharashtra. BIG IT JOBS has not indicated remote or hybrid options for this role, so candidates should plan for on-site work.

What does a Principal Site Reliability Engineer (sre) at BIG IT JOBS earn?

BIG IT JOBS has not disclosed a salary range in this posting. Many employers share specifics later in the interview process; you can also ask during a recruiter screen if compensation transparency is important to you.

When was the Principal Site Reliability Engineer (sre) role at BIG IT JOBS posted?

This role was posted on April 17, 2026 (51 days ago). It's still listed as actively hiring; we re-confirm openings against the source system multiple times per day and remove closed roles.

AI-powered job search

Get every job scored to your resume

Upload your resume and get jobs ranked, your resume tailored, and employee contacts found automatically.

Get Started Free

No credit card to start

Principal Site Reliability Engineer (sre)

BIG IT JOBS

Full Timeprincipal

Maharashtra, INPosted 7 weeks ago

Role Overview

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonBashAWSGCPAzureKubernetesTerraformGitHub Actions

Job Description

Principal Site Reliability Engineer (sre)

Privacera

3 hours ago

Expires On17 May 2026

Pune City, Maharashtra, India

Apply now

Job description & requirements

Role: Principal Site Reliability Engineer (SRE) – Data Platforms

Role Summary

Own reliability, support, and operations of enterprise data platforms (Trust3 AI, Snowflake, Databricks)

with a primary focus on Google Cloud Platform (GCP). This is a deeply hands-on Principal SRE role

combining managed services ownership, advanced production engineering, and reliability at scale.

What You’ll Do

Own end-to-end platform lifecycle and managed services delivery: installation, operations,

upgrades, optimization, and continuous platform health

Take full ownership of critical production incidents with deep debugging, RCA, and permanent fixes
Troubleshoot complex, cross-system issues across GCP (GKE, IAM, networking), data platforms, and connectors
Lead performance tuning, scalability optimization, and system hardening for high-throughput systems
Design and implement automation across deployments, monitoring, and operations
Manage secrets and secure integrations using Vault (or similar) within platform and CI/CD workflows
Install, upgrade, and operate Trust3 AI on GCP (GKE) across multi-region environments
Ensure accurate and reliable enforcement of data access policies
Build and enhance observability (metrics, logs, alerts) for proactive issue detection
Eliminate operational toil through continuous reliability improvements
Own issues end-to-end with strong stakeholder communication and SLA adherence
Collaborate with Engineering and Product to resolve issues and influence platform improvements
Lead managed services operations including monitoring, incident prevention, capacity planning,

DR readiness, and service-level outcomes (SLA, uptime, upgrade timelines)

Skills Required

Cloud: Strong expertise in GCP (GKE, IAM, BigQuery, GCS, VPC, Cloud Monitoring/Logging); AWS/Azure exposure is a plus
Data Platforms: Snowflake, Databricks, BigQuery
Infra & CI/CD: Kubernetes, Helm, CI/CD (GitHub Actions, GitLab CI, or similar), Terraform (preferred)
Scripting: Python / Bash
Observability: Prometheus, Grafana, ELK
Security: IAM, RBAC/ABAC, data governance (Trust3 AI/Ranger preferred), secrets management (Vault or similar)

Experience

10+ years in SRE / DevOps / Production Engineering
Strong expertise in debugging distributed systems and complex production environments
Proven ownership of high-severity incidents and large-scale production systems
Demonstrated ability to independently solve ambiguous, high-impact technical problems
Track record of driving reliability, automation, and operational excellence at scale
Experience running high-throughput, always-on (24x7) systems with large data volumes and strict uptime SLAs

Why This Role

Principal-level, deeply hands-on IC role (no people management)
End-to-end ownership of mission-critical data platforms
Work on complex production challenges across cloud, data, and security layers
High impact on enterprise data access, governance, and reliability

Important Note

This is a production-first role involving end-to-end incident ownership, deep technical problem solving,

and managed services operations — not a pure DevOps/build-only or people management role.

Location :

Pune City, Maharashtra, India

Frequently Asked Questions

How do I apply for the Principal Site Reliability Engineer (sre) position at BIG IT JOBS?

Where is the Principal Site Reliability Engineer (sre) position at BIG IT JOBS located?

This position is based in Maharashtra. BIG IT JOBS has not indicated remote or hybrid options for this role, so candidates should plan for on-site work.

What does a Principal Site Reliability Engineer (sre) at BIG IT JOBS earn?

When was the Principal Site Reliability Engineer (sre) role at BIG IT JOBS posted?

This role was posted on April 17, 2026 (51 days ago). It's still listed as actively hiring; we re-confirm openings against the source system multiple times per day and remove closed roles.