Role Overview
ECS is hiring a Senior Site Reliability Engineer. This is a full-time role in Fairfax. Part of ECS's Devops hiring. Full responsibilities, required qualifications, and the apply link are listed in the description below.
Resume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score
Job Description
ECS is seeking a Senior Site Reliability Engineer to work in our Fairfax, VA office.
ECS is seeking talented professionals to join our successful and growing team in building the next-generation Continuous Diagnostics and Mitigation (CDM) Cyber data solution. The CDM Program is the Cybersecurity and Infrastructure Security Agency's (CISA) dynamic approach to strengthening the cybersecurity of Federal networks and systems through better awareness and visibility into their security posture and cyber threats. ECS is responsible for designing, building, deploying, operating, and maintaining a complete 'Data Services' solution which includes the collection, normalization, visualization, and sharing of cyber data from more than 100 Federal agencies. The CDM Data Services product is an integrated suite of multiple Commercial Off the Shelf (COTS) products, software configuration packages, and custom code which work together to operate as an integrated solution tailored to meet Department of Homeland Security (DHS) requirements. 
We are seeking professionals who thrive in a dynamic, fast-paced, and highly collaborative environment where problem-solving, critical thinking, and a holistic approach to serving the mission are key. Our program operates within the Scaled Agile Framework (SAFe). An aptitude and enthusiasm for continuous learning, improvement, and cyber security is a must!
Role & Responsibilities:
ECS is seeking a talented Senior Site Reliability Engineer (SRE) to play a key role in defining, implementing, and growing our SRE practice to ensure the reliability, availability, and performance of our critical production environments.
The Senior SRE will contribute to a culture of continuous improvement, identifying areas for enhancement, and driving initiatives to improve system reliability, scalability, and efficiency.
The successful candidate will have demonstrated hands-on experience designing, implementing, and maintaining solutions to ensure that systems, including infrastructure and applications, are resilient, highly available, and performant. The Senior SRE will also play a critical role in defining and measuring the Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for our solution.
The Senior SRE will be responsible for setting up comprehensive logging, monitoring, and alerting solutions using the Elastic stack and other tools as necessary to ensure the continuous performance of services. Additionally, they will respond to incidents, perform root cause analyses, and implement solutions to prevent reoccurrences. The Senior SRE will work in close collaboration with other SRE team members, developers, testers, infrastructure engineers, DevOps engineers, and other stakeholders to integrate reliability and observability into the software development lifecycle.
- Must be a US citizen with the ability to obtain Public Trust Suitability.
- 6+ years of experience as a Site Reliability Engineer (SRE) or equivalent
- 6+ years of demonstrated experience designing, implementing, and maintaining observability solutions to include logging, monitoring, and alerting
- 6+ years of hands-on experience with SRE tools (e.g., Elastic, Prometheus, Grafana, Splunk, etc.)
- 3+ years defining and measuring SLOs and SLIs
- 3+ years of relevant experience using cloud platforms (AWS GovCloud preferred)
- 3+ years of hands-on programming or scripting (e.g., Python, Bash, etc.)
- Strong knowledge of microservices, containerization, and orchestration tools (Docker, Kubernetes)
- Proven ability to collaborate with cross-functional teams (development, testing, and product) to integrate reliability and observability into the software development lifecycle
- Strong problem-solving and analytical skills
- Proactive, detail-oriented approach to identifying inefficiencies and implementing improvements.
- Proficient in developing Synthetic monitoring scripts using typescript.
Frequently Asked Questions
How do I apply for the Senior Site Reliability Engineer position at ECS?
Use the Apply button above to submit your application directly to ECS. Most applications take less than 5 minutes if your resume and contact details are ready, and you'll be routed to the employer's official application system to finish.
Where is the Senior Site Reliability Engineer position at ECS located?
This position is based in Fairfax. ECS has not indicated remote or hybrid options for this role, so candidates should plan for on-site work.
What does a Senior Site Reliability Engineer at ECS earn?
ECS has not disclosed a salary range in this posting. Many employers share specifics later in the interview process; you can also ask during a recruiter screen if compensation transparency is important to you.
When was the Senior Site Reliability Engineer role at ECS posted?
This role was posted on April 1, 2026 (68 days ago). It's still listed as actively hiring; we re-confirm openings against the source system multiple times per day and remove closed roles.
How much experience does the Senior Site Reliability Engineer role at ECS require?
This is a senior-level position. Most senior roles call for 5+ years of directly relevant experience. ECS lists their specific requirements in the description below, so review the must-have qualifications closely before applying.
AI-powered job search
Get every job scored to your resume
Upload your resume and get jobs ranked, your resume tailored, and employee contacts found automatically.
Get Started FreeNo credit card to start