Skip to main content
CareFirst BlueCross BlueShield logo

Lead Site Reliability Engineer

CareFirst BlueCross BlueShield
Full Timelead
Baltimore, Maryland, US$113k – $225kPosted March 31, 2026

Job Description

Purpose

We are seeking a Lead Site Reliability Engineer who will ensure the reliability, availability, and operational integrity of the enterprise Data & Analytics ecosystem, including data platforms, interoperability services, streaming infrastructure, APIs, and analytics environments. This role serves as the primary incident commander, problem manager, and reliability leader, coordinating technical teams and business stakeholders during service disruptions while driving root cause remediation and systemic reliability improvements.

This individual operates as a cross-functional reliability authority, bridging engineering, platform, interoperability, governance, and business teams to ensure critical data services meet defined service level objectives and operational expectations.

Essential Functions

  • Incident Management and Command: Serve as the primary Incident Commander for Severity 1 and Severity 2 incidents impacting data platforms, interoperability services, APIs, and analytics systems. This role leads incident response, coordinates crossengineering and vendor teams, assesses business impact, and establishes clear mitigation and recovery plans. The Lead SRE ensures timely stakeholder communication, accurate incident documentation, and rapid restoration of service while maintaining strong operational control.
  • Problem Management & Root Cause Analysis: Own the formal problem management lifecycle for major incidents and recurring reliability risks. This role facilitates structured, blameless root cause analysis to identify systemic technical, architectural, operational, and process failures, and ensures corrective and preventative actions are clearly defined, assigned, and tracked to completion. The focus is on eliminating repeat incidents through durable, system-level improvements rather than short-term fixes.
  • Reliability Improvement and Prevention: Partner with engineering, platform, and architecture teams to proactively strengthen reliability across data and interoperability services by identifying systemic risks and driving architectural, operational, and monitoring improvements. Establish reliability standards and operational readiness criteria, ensure services have robust monitoring/alerting/recovery, and automate recovery and failover processes to measurably reduce incident frequency and recovery time.
  • Service Reliability Governance: Establish and maintain operational reliability standards across the Data & Analytics organization by defining incident severity classification, incident response procedures, playbooks, and ensuring every service has clear ownership and operational accountability. Track reliability and operational health metrics, lead operational readiness reviews for new capabilities and deployments, and partner with change/release management to reduce operational risk.
  • Business and Stakeholder Coordination: Serve as the primary operational liaison between technical teams and business stakeholders during incidents and reliability improvement efforts, ensuring clear, timely communication of impact, status, and resolution. Provide leadership visibility into reliability risks and mitigation plans, coordinate with dependent operational teams, and support regulatory or compliance reporting as required.

Education Level

Bachelor's Degree in Information Technology or Computer Science OR in lieu of a Bachelor's degree, an additional 4 years of relevant work experience is required in addition to the required work experience.

Experience

8 years proven success overseeing the design, development, and implementation of software systems and applications.

Preferred Qualifications

  • 8+ years supporting production data processes, data & integration platforms, and distributed systems.
  • Experience managing or leading incident response for enterprise production systems.
  • Strong understanding of distributed data platforms, APIs, service orchestration, and integration architectures.
  • Experience facilitating root cause analysis and operational improvement initiatives.
  • Ability to coordinate cross-functional technical teams during high-pressure incidents.
  • Excellent written and verbal communication skills with both technical and business stakeholders.
  • Familiarity with AI and agentic workloads for supporting data and interoperability platform capabilities.

Knowledge, Skills And Abilities (KSAs)

  • Knowledge of programming languages and web based technologies.
  • Proficient in Microsoft Office applications.
  • Ability to collaborate to solve technical problems across teams.
  • Excellent communication skills both written and verbal.
  • Must be able to meet established deadlines and handle multiple customer service demands from internal and external customers, within set expectations for service excellence. Must be able to effectively communicate and provide positive customer service to every internal and external customer, including customers who may be demanding or otherwise challenging.

Salary Range

$113,256 - $224,939

Travel Requirements

10% in office for stakeholder and team meetings.

Salary Range Disclaimer

The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the work is being performed. This compensation range is specific and considers factors such as (but not limited to) the scope and responsibilities of the position, the candidate's work experience, education/training, internal peer equity, and market and business consideration. It is not typical for an individual to be hired at the top of the range, as compensation decisions depend on each case's facts and circumstances, including but not limited to experience, internal equity, and location. In addition to your compensation, CareFirst offers a comprehensive benefits package, various incentive programs/plans, and 401k contribution programs/plans (all benefits/incentives are subject to eligibility requirements).

Department

Data Platforms and Services

Equal Employment Opportunity

CareFirst BlueCross BlueShield is an Equal Opportunity (EEO) employer. It is the policy of the Company to provide equal employment opportunities to all qualified applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, protected veteran or disabled status, or genetic information.

Where To Apply

Please visit our website to apply: www.carefirst.com/careers

Federal Disc/Physical Demand

Note: The incumbent is required to immediately disclose any debarment, exclusion, or other event that makes him/her ineligible to perform work directly or indirectly on Federal health care programs.

Physical Demands

The associate is primarily seated while performing the duties of the position. Occasional walking or standing is required. The hands are regularly used to write, type, key and handle or feel small controls and objects. The associate must frequently talk and hear. Weights up to 25 pounds are occasionally lifted.

Sponsorship in US

Must be eligible to work in the U.S. without Sponsorship

Want AI-powered job matching?

Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.

Get Started Free