Resume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score
Job Description
Company Description
Shore is an IT and strategy consulting firm focusing on innovation in the public sector. We deliver services and tools that advance public sector organizations and the services they provide.
Shore’s working environment is flexible, collaborative, and down to earth. We work hard and deliver exceptional quality, but don’t take ourselves too seriously in doing so.
What it’s like to work at Shore:
- Flexible culture and working environment
- Opportunities to learn and advance
- Contribute to innovative projects
- Be encouraged to bring your own ideas forward
Job Description
Reporting to the Director, Platform Services, the Lead DevOps Engineer is a senior, hands-on technical leader responsible for building, operating, and continuously improving Afflo’s production and non-production cloud environments, CI/CD pipelines, observability, and operational toolchain. This role translates the VP’s operational strategy, standards, and compliance objectives into reliable, scalable, secure implementation while leading execution across the DevOps function day to day.
You will work closely with Product Engineering, QA, Service Management, Implementation/Project Delivery, and external vendors to ensure Afflo services meet uptime, performance, security, and audit expectations in regulated healthcare contexts. You will also mentor other DevOps engineers, lead incident response and prevention work, and drive practical improvements that reduce operational risk and accelerate safe delivery.
This role is demanding and diverse, involving:
- Operational ownership of cloud infrastructure and delivery pipelines
- Release engineering and environment lifecycle management
- Observability, incident leadership, and continuous improvement
- Security controls, evidence readiness, and DR/BCP execution
- Tooling automation that reduces toil and improves team productivity
Responsibilities
Operational Ownership
- Own the reliability and day-to-day operation of Afflo environments (production and non-production), ensuring uptime, performance, responsiveness, and strong operational hygiene.
- Lead triage, mitigation, and restoration during incidents; coordinate with Service Management and engineering stakeholders through resolution.
- Conduct and author post-incident reviews and drive prevention work to reduce recurrence, improve MTTR, and increase change safety.
- Establish and maintain on-call standards, escalation paths, maintenance practices, and operational runbooks aligned with IT Operations and System Administration policies.
Cloud Infrastructure Engineering (IaC-First)
- Design, build, and maintain secure, resilient cloud infrastructure using Infrastructure as Code (IaC) with reusable modules, review discipline, and predictable environment patterns.
- Build and improve environment lifecycle workflows (provision, reset, clone, teardown) for QA/UAT/demo/customer environments and internal team needs.
- Implement secure-by-default patterns: network segmentation, least privilege, secrets handling, encryption, audit logging, and access reviews.
- Perform capacity planning and cost optimization—balancing availability, scalability, and operating cost, and providing actionable recommendations to the VP of Delivery.
- Design, set up and maintain AI specific workloads and pipelines. E.g. Data processing, model training, inference etc.
CI/CD, Release Engineering, and Delivery Enablement
- Build and maintain automated CI/CD pipelines to enable rapid, safe deployments, including release gates, automated checks, artifact integrity, and rollback readiness.
- Participate in and/or lead major release windows and maintenance deployments; ensure readiness checks, comms coordination, and post-release verification.
- Standardize release processes across teams/products to reduce variance, improve predictability, and support project timelines and SLAs.
- Partner with Product Engineering and QA to improve test reliability, deployment quality, and developer experience.
Observability and Monitoring
- Implement and maintain monitoring, alerting, logging, and dashboards that provide actionable signals for availability, performance, security, and data integrity.
- Reduce alert noise and improve detection coverage through tuning, SLO/SLI development, and automated verification checks.
- Provide operational insights to engineering teams using logs and metrics to identify trends, performance constraints, and failure patterns.
Security, Compliance, and Audit Readiness Enablement
- Implement operational controls and evidence-producing mechanisms aligned with IT Operations policies and selected frameworks (e.g., SOC2/ISO-aligned practices).
- Support security and governance requests by producing operational materials (diagrams, environment descriptions, safeguards, maintenance practices) and operational evidence in a timely manner.
- Coordinate with Security/Service Management on vulnerability management, patching practices, vendor security events, and operational monitoring requirements.
- Contribute to disaster recovery and business continuity readiness by maintaining runbooks, validating backups/restores, and participating in recovery exercises/tabletop tests.
Tooling, Internal Enablement, and Cross-Team Support
- Support onboarding/offboarding and access provisioning across enterprise tools (email, document storage, chat, ticketing, VPN, dev/QA environment access), emphasizing least privilege and traceability.
- Build automation/scripts to streamline frequent employee tasks and reduce operational toil.
- Maintain a clear, prioritized operational ticket pipeline; triage requests, track outcomes, and communicate progress and risks.
Vendor Collaboration and Operational Toolchain
- Work with vendors and internal stakeholders to procure, configure, and maintain operational tooling (hosting, monitoring, backups, authentication services, pipeline tools).
- Coordinate vendor-driven maintenance/outages and ensure internal and customer-facing communications occur when required.
- Provide practical input on tool selection and implementation feasibility, aligned to the VP’s standards and roadmap.
Leadership Within the DevOps Function
- Mentor DevOps engineers through pairing, code/IaC reviews, incident coaching, and documentation/runbook development.
- Raise team maturity by defining “how we do it here”: templates, standards, checklists, guardrails, and repeatable operational processes.
- Serve as senior escalation for complex infrastructure/pipeline issues and lead cross-team problem-solving efforts.
Qualifications
- Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
- 7–10+ years of progressive experience in DevOps/SRE/infrastructure engineering, including ownership of production systems.
- Strong Linux, networking, and troubleshooting skills across distributed systems.
- Advanced experience with cloud environments (Azure and/or GCP preferred; multi-cloud exposure is an asset).
- Expert-level Infrastructure as Code experience (e.g., Terraform/Pulumi), including modular design, review practices, and safe change management.
- Strong Kubernetes experience (operations, deployments, security posture, cluster/platform troubleshooting).
- Strong experience with designing and building AI training and inference workflows in a cloud environment.
- Proven CI/CD and release engineering experience (e.g., GitLab CI, Jenkins, ArgoCD or equivalent), including quality gates and safe deployment strategies.
- Proven experience with software development lifecycle (SDLC) methodologies and best practices,
- Experience with IT Service Management (ITSM) (ServiceNow, JIRA Service Management, BNC Remedy) and Kanban project management (JIRA Software or equivalent).
- Demonstrated incident leadership (on-call participation, incident coordination, RCA authorship, prevention follow-through).
- Security-minded approach: least privilege, secrets management, vulnerability management, audit logging, and regulated-environment operational discipline.
- Excellent written and verbal communication skills; strong documentation habits (knowledge base, runbooks, diagrams, procedures).
- Ability to work under deadlines, switch contexts quickly, and deliver across multiple initiatives.
Additional Information
Nice-to-Haves
- Experience supporting regulated healthcare or PHI-adjacent environments and governance expectations.
- Experience supporting SOC2/ISO-style audits (evidence, control operation, policy-driven operations).
- Familiarity with internal IT tooling and identity/access systems (e.g., SSO, VPN, device management patterns).
- Experience building internal developer platforms or “golden path” delivery tooling.
About Shore Consulting
Shore Consulting
shore-consulting.com
AI-powered job search
Get every job scored to your resume
Upload your resume and get jobs ranked, your resume tailored, and employee contacts found automatically.
Get Started FreeNo credit card to start