Resume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score
Job Description
Role Overview:
You will be joining Okta as a Staff Site Reliability Engineer in a highly technical role with a focus on Splunk and Grafana. Your main responsibility will be to own and enhance the observability ecosystem by architecting a scalable telemetry platform and automating infrastructure deployment using tools like Terraform. Your work will be crucial in ensuring the performance, cost-efficiency, and integration of logging architecture with automated workflows.
Key Responsibilities:
- Lead the design and optimization of Splunk environments, including indexer performance, search efficiency, and data models to enable rapid troubleshooting and cost-effectiveness.
- Architect and maintain advanced Grafana dashboards that consolidate disparate data sources for real-time system health insights.
- Design, build, and scale observability infrastructure through tools like Terraform.
- Optimize the collection, processing, and storage of telemetry data to ensure reliability and low latency.
- Develop custom Splunk workflows and integrations for automated responses to system events, reducing Mean Time to Resolution (MTTR).
- Participate in on-call rotations and lead post-incident reviews to drive improvements through "observability-driven development."
Qualification Required:
- Deep hands-on experience with Splunk administration, search optimization, and architecting complex data pipelines.
- Proven ability to create actionable Grafana dashboards for operational insights.
- Minimum 8+ years of experience in SRE, DevOps, or Systems Engineering with a focus on high-availability systems.
- Strong coding skills in Go, Python, or Ruby for internal tools and automation.
- Hands-on experience with OpenTelemetry, Prometheus, or similar frameworks.
- Proficiency in Linux internals, networking, and container orchestration.
Company Details:
Okta is the leading independent provider of enterprise identity solutions, enabling secure connections between people and technology. With a focus on innovation and over 7,000 pre-built integrations, Okta helps organizations securely connect the right people to the right technologies. Trusted by over 19,300 organizations, Okta's mission is to protect the identities of workforces and customers. Role Overview:
You will be joining Okta as a Staff Site Reliability Engineer in a highly technical role with a focus on Splunk and Grafana. Your main responsibility will be to own and enhance the observability ecosystem by architecting a scalable telemetry platform and automating infrastructure deployment using tools like Terraform. Your work will be crucial in ensuring the performance, cost-efficiency, and integration of logging architecture with automated workflows.
Key Responsibilities:
- Lead the design and optimization of Splunk environments, including indexer performance, search efficiency, and data models to enable rapid troubleshooting and cost-effectiveness.
- Architect and maintain advanced Grafana dashboards that consolidate disparate data sources for real-time system health insights.
- Design, build, and scale observability infrastructure through tools like Terraform.
- Optimize the collection, processing, and storage of telemetry data to ensure reliability and low latency.
- Develop custom Splunk workflows and integrations for automated responses to system events, reducing Mean Time to Resolution (MTTR).
- Participate in on-call rotations and lead post-incident reviews to drive improvements through "observability-driven development."
Qualification Required:
- Deep hands-on experience with Splunk administration, search optimization, and architecting complex data pipelines.
- Proven ability to create actionable Grafana dashboards for operational insights.
- Minimum 8+ years of experience in SRE, DevOps, or Systems Engineering with a focus on high-availability systems.
- Strong coding skills in Go, Python, or Ruby for internal tools and automation.
- Hands-on experience with OpenTelemetry, Prometheus, or similar frameworks.
- Proficiency in Linux internals, networking, and container orchestration.
Company Details:
Okta is the leading independent provider of enterprise identity solutions, enabling secure connections between people and technology. With a focus on innovation and over 7,000 pre-built integrations, Okta helps organizations securely connect the right people to the right technologies. Trusted by over 19,300 organizations, Okta's mission is to protect the identities of workforces and customers.
Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free