Skip to main content
Databricks logo

Staff Technical Program Manager – GenAI Ops & Capacity Planning

Databricks
Full Timemanager
Mountain View, CaliforniaPosted January 30, 2026

Job Description

P-1489

About Databricks

At Databricks, we are passionate about enabling data teams to solve the world’s toughest problems — from making the next mode of transportation a reality to accelerating medical breakthroughs. We do this by building and operating the world’s best data and AI infrastructure platform so our customers can turn deep data insights into real business impact. Founded by engineers and deeply customer-obsessed, we thrive on solving hard technical challenges, from next-generation data experiences to operating infrastructure at massive global scale. And we’re only getting started. For more information, visit www.databricks.com.

The Role

Databricks is looking for a Staff Technical Program Manager to drive GenAI Operations and Capacity Planning for our large-scale LLM and GPU-backed platform. This role is designed for a senior, hands-on TPM who thrives in technically deep, data-driven environments and enjoys owning complex operational programs end to end.

As a Staff TPM, you will own execution for critical GenAI operational initiatives, operate with significant autonomy, and partner closely with AI/ML engineering, infrastructure, finance, partner ops and cloud/LLM providers. You will use strong analytical skills to guide decisions, surface risks, and continuously improve how Databricks launches, scales, and governs GenAI workloads.

You will report to a Technical Program Leader and operate across multiple time zones in a fast-moving, highly ambiguous environment.

What You’ll Do

GenAI & LLM Operations

  • Plan and execute day-0 launches of new LLM models on Databricks, ensuring production readiness across engineering,commercialization,go-to-market, legal and cloud service partners
  • Partner with AI/ML and platform engineering teams to operationalize LLM onboarding, rollout, and lifecycle management.
  • Define and maintain launch checklists, operational runbooks, and success metrics for GenAI workloads.

GPU & LLM Capacity Planning

  • Own GPU and LLM capacity planning, forecasting, and allocation for GenAI workloads.
  • Build and maintain SQL-driven analytical models and dashboards to forecast demand, track utilization, and surface capacity risks.
  • Balance customer demand, growth trajectories, and contractual commitments to inform short- and medium-term capacity decisions.

Utilization, Efficiency & Analytics

  • Track and drive efficient consumption of GPU and LLM capacity, identifying underutilization, contention, and inefficiencies.
  • Define and monitor KPIs for utilization, efficiency, and reliability of GenAI platforms.
  • Use data to recommend improvements to engineering roadmaps, operational processes, and cost optimization efforts.

Governance, Controls & Reporting

  • Execute governance mechanisms to ensure GenAI capacity usage aligns with contractual, financial, and compliance requirements.
  • Produce clear, data-backed reporting for senior leaders on capacity health, utilization trends, and operational risks.
  • Generate consumption reports, usage metrics reporting and share of wallet attestations
  • Ensure documentation, controls, and processes are audit-ready and consistently followed.

What We Look For

Minimum Qualifications

  • 10+ years of overall industry experience, including 7+ years in Technical Program Management.
  • Experience leading cross-functional GenAI, AI/ML, or infrastructure programs from planning through launch and steady-state operations.
  • Strong background in capacity planning, forecasting, and infrastructure analytics.
  • Advanced SQL skills

Want AI-powered job matching?

Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.

Get Started Free