Skip to main content
I

Site Reliability Engineer - Remote

IIIIIIUS
Full Timemid
Reston, Virginia, USPosted February 24, 2026

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

SQLAWSDockerKubernetesTerraformJenkinsGitHub ActionsGitGitHubJiraAirflowCI/CDDevOps

Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score

Job Description

Description

ICF is a mission-driven company filled with people who care deeply about improving the lives of others and making the world a better place. Our core values include Embracing Difference; we seek candidates who are passionate about building a culture that encourages, embraces, and hires dimensions of difference.

Our Health Engineering Solutions (HES) team works side by side with customers to articulate a vision for success, and then make it happen. We know success doesn't happen by accident. It takes the right team of people, working together on the right solutions for the customer. We are looking for a seasoned SRE to establish a culture of improvement in observability and reliability.

You will work closely with software engineering teams to ensure that applications, databases, pipelines and APIs run reliably. You will be expected to create, set, and exceed service level objectives as key indicators of application health. You will be working on a mission critical software program whose goal is to support the ecosystem of Centers for Medicare & Medicaid Services (CMS).

Our core work hours are 10am - 4pm Eastern Time with the option to start earlier or work later depending on your time zone.

Key Responsibilities:

  • Define and maintain SLIs, SLOs, and SLAs for the Internet-based Quality Improvement and Evaluation System (iQIES) application.
  • Performance tuning that will model load scenarios, forecasting capacity, and optimize scaling strategies
  • Design and optimize the observability stack through New Relic, CloudWatch, and Jenkins CI/CD pipelines
  • Participate in root cause analysis for operational issues and improve incident response process
  • Participate in creating, monitoring, and optimizing actionable alerts to respond to issues in a timely manner
  • Develop tools and scripts
  • Develop and maintain Jenkins CI/CD pipelines, using declarative Jenkinsfiles and foundational Groovy for pipeline logic and enhancements
  • Deploy services to Fargate, EKS, Lambda, Airflow, Databases
  • Manage security groups and access controls. Thoroughly understand fundamentals like security groups, IAM, managing RDS
  • Apply patch management and hardening practices
  • Align with DevOps and Technical Leads to ensure overall strategy
  • Actively participate in releases and product launches with expectation of being online during release windows

Required Qualifications

  • 5+ years experience in a software development environment and a Bachelor’s degree; OR 3+ years experience in a software development environment and a Master’s degree
  • 5+ years supporting a high‑availability production environment (cloud or on‑prem)
  • 3+ years of working in a SRE role in a large scale cloud implementing high availability and scalability
  • 3+ years of experience focused on SRE, DevOps, or Platform Engineering
  • Must be able to obtain and maintain a public trust clearance
  • Candidate must reside in the US, be authorized to work in the US, and work must be performed in the US
  • Must have lived in the US 3 full years out of the last 5 years

Preferred Qualifications

  • Previous work in a regulated healthcare or federal agency environment
  • Full stack web development experience
  • Expert in deployment techniques to minimize down-time like Blue-Green, Canary, A/B testing approaches, and zero downtime deployments
  • Understanding of security groups and access controls
  • Experience with Atlassian tooling such as Jira and Confluence

Professional Skills and Tools:

  • Cloud platform experience with AWS
  • Observability: CloudWatch, New Relic or similar
  • Infrastructure: Kubernetes, Docker
  • IaC: Terraform
  • CI/CD: Git, Jenkins or GitHub Actions
  • Database: SQL relational database
  • Docker: Thorough understanding of Docker and Docker Compose. Understand best practices, caching, volume mounts, etc
  • Highly effective analytical, problem-solving, and decision-making capabilities.
  • Strong written and verbal communication skills
  • Ability to clearly articulate and communicate complex technical ideas to non-SRE colleagues.
  • Ability to understand project requirements and be innovative in finding solutions in highly regulated government environments.
  • Flexibility and the ability to accept a change in priorities as necessary.
  • Demonstrated time management skills.
  • Strong organizational skills with attention to detail.

Job Location:

This position requires that the job be performed in the United States. If you accept this position, you should note that ICF does monitor employee work locations and blocks access from foreign locations/foreign IP addresses, and also prohibits personal VPN connections.

Working at ICF

ICF is a global advisory and technology services provider, but we’re not your typical consultants. We combine unmatched expertise with cutting-edge technology to help clients solve their most complex challenges, navigate change, and shape the future.

We can only solve the world's toughest challenges by

Want AI-powered job matching?

Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.

Get Started Free