Lead Site Reliability Engineer
CareFirst BlueCross BlueShieldRole Overview
CareFirst BlueCross BlueShield is hiring a Lead Site Reliability Engineer. This is a full-time role in Baltimore. Part of CareFirst BlueCross BlueShield's Lifecycle hiring. The posted range is $113k to $225k. Full responsibilities, required qualifications, and the apply link are listed in the description below.
Resume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score
Job Description
Purpose
We are seeking a Lead Site Reliability Engineer who will ensure the reliability, availability, and operational integrity of the enterprise Data & Analytics ecosystem, including data platforms, interoperability services, streaming infrastructure, APIs, and analytics environments. This role serves as the primary incident commander, problem manager, and reliability leader, coordinating technical teams and business stakeholders during service disruptions while driving root cause remediation and systemic reliability improvements.
This individual operates as a cross-functional reliability authority, bridging engineering, platform, interoperability, governance, and business teams to ensure critical data services meet defined service level objectives and operational expectations.
Essential Functions
- Incident Management and Command: Serve as the primary Incident Commander for Severity 1 and Severity 2 incidents impacting data platforms, interoperability services, APIs, and analytics systems. This role leads incident response, coordinates crossengineering and vendor teams, assesses business impact, and establishes clear mitigation and recovery plans. The Lead SRE ensures timely stakeholder communication, accurate incident documentation, and rapid restoration of service while maintaining strong operational control.
- Problem Management & Root Cause Analysis: Own the formal problem management lifecycle for major incidents and recurring reliability risks. This role facilitates structured, blameless root cause analysis to identify systemic technical, architectural, operational, and process failures, and ensures corrective and preventative actions are clearly defined, assigned, and tracked to completion. The focus is on eliminating repeat incidents through durable, system-level improvements rather than short-term fixes.
- Reliability Improvement and Prevention: Partner with engineering, platform, and architecture teams to proactively strengthen reliability across data and interoperability services by identifying systemic risks and driving architectural, operational, and monitoring improvements. Establish reliability standards and operational readiness criteria, ensure services have robust monitoring/alerting/recovery, and automate recovery and failover processes to measurably reduce incident frequency and recovery time.
- Service Reliability Governance: Establish and maintain operational reliability standards across the Data & Analytics organization by defining incident severity classification, incident response procedures, playbooks, and ensuring every service has clear ownership and operational accountability. Track reliability and operational health metrics, lead operational readiness reviews for new capabilities and deployments, and partner with change/release management to reduce operational risk.
- Business and Stakeholder Coordination: Serve as the primary operational liaison between technical teams and business stakeholders during incidents and reliability improvement efforts, ensuring clear, timely communication of impact, status, and resolution. Provide leadership visibility into reliability risks and mitigation plans, coordinate with dependent operational teams, and support regulatory or compliance reporting as required.
Education Level
Bachelor's Degree in Information Technology or Computer Science OR in lieu of a Bachelor's degree, an additional 4 years of relevant work experience is required in addition to the required work experience.
Experience
8 years proven success overseeing the design, development, and implementation of software systems and applications.
Preferred Qualifications
- 8+ years supporting production data processes, data & integration platforms, and distributed systems.
- Experience managing or leading incident response for enterprise production systems.
- Strong understanding of distributed data platforms, APIs, service orchestration, and integration architectures.
- Experience facilitating root cause analysis and operational improvement initiatives.
- Ability to coordinate cross-functional technical teams during high-pressure incidents.
- Excellent written and verbal communication skills with both technical and business stakeholders.
- Familiarity with AI and agentic workloads for supporting data and interoperability platform capabilities.
Knowledge, Skills And Abilities (KSAs)
- Knowledge of programming languages and web based technologies.
- Proficient in Microsoft Office applications.
- Ability to collaborate to solve technical problems across teams.
- Excellent communication skills both written and verbal.
- Must be able to meet established deadlines and handle multiple customer service demands from internal and external customers, within set expectations for service excellence. Must be able to effectively communicate and provide positive customer service to every internal and external customer, including customers who may be demanding or otherwise challenging.
Salary Range
$113,256 - $224,939
Travel Requirements
10% in office for stakeholder and team meetings.
Salary Range Disclaimer
The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the work is being performed. This compensation range is specific and considers factors such as (but not limited to) the scope and responsibilities of the position, the candidate's work experience, education/training, internal peer equity, and market and business consideration. It is not typical for an individual to be hired at the top of the range, as compensation decisions depend on each case's facts and circumstances, including but not limited to experience, internal equity, and location. In addition to your compensation, CareFirst offers a comprehensive benefits package, various incentive programs/plans, and 401k contribution programs/plans (all benefits/incentives are subject to eligibility requirements).
Department
Data Platforms and Services
Equal Employment Opportunity
CareFirst BlueCross BlueShield is an Equal Opportunity (EEO) employer. It is the policy of the Company to provide equal employment opportunities to all qualified applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, protected veteran or disabled status, or genetic information.
Where To Apply
Please visit our website to apply: www.carefirst.com/careers
Federal Disc/Physical Demand
Note: The incumbent is required to immediately disclose any debarment, exclusion, or other event that makes him/her ineligible to perform work directly or indirectly on Federal health care programs.
Physical Demands
The associate is primarily seated while performing the duties of the position. Occasional walking or standing is required. The hands are regularly used to write, type, key and handle or feel small controls and objects. The associate must frequently talk and hear. Weights up to 25 pounds are occasionally lifted.
Sponsorship in US
Must be eligible to work in the U.S. without Sponsorship
Frequently Asked Questions
How do I apply for the Lead Site Reliability Engineer position at CareFirst BlueCross BlueShield?
Use the Apply button above to submit your application directly to CareFirst BlueCross BlueShield. Most applications take less than 5 minutes if your resume and contact details are ready, and you'll be routed to the employer's official application system to finish.
Where is the Lead Site Reliability Engineer position at CareFirst BlueCross BlueShield located?
This position is based in Baltimore. CareFirst BlueCross BlueShield has not indicated remote or hybrid options for this role, so candidates should plan for on-site work.
How much does the Lead Site Reliability Engineer role at CareFirst BlueCross BlueShield pay?
CareFirst BlueCross BlueShield has posted a compensation range of $113k to $225k for this position. Final offers typically vary based on candidate experience, location, and internal salary bands.
When was the Lead Site Reliability Engineer role at CareFirst BlueCross BlueShield posted?
This role was posted on March 31, 2026 (69 days ago). It's still listed as actively hiring; we re-confirm openings against the source system multiple times per day and remove closed roles.
AI-powered job search
Get every job scored to your resume
Upload your resume and get jobs ranked, your resume tailored, and employee contacts found automatically.
Get Started FreeNo credit card to start