Skip to main content
Accenture logo

AI & HPC Infrastructure Engineer

Accenture
Full TimemidHybrid
Arlington, Virginia, US

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonAWSGCPAzureDockerKubernetesTerraformAnsibleTensorFlowPyTorchDevOps

Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score

Job Description

We Are:

The Global Infrastructure Engineering AI & HPC team is at the center of enabling infrastructure reinvention for the next era of digital solutions powered by AI and High-Performance Computing (HPC). We bring together deep technical expertise across cloud, on-prem, and hybrid environments to design, build, and operate accelerated infrastructure that powers high-performance workloads at scale. Our solutions enable some of our most strategic and mission-critical clients to unlock new levels of performance, efficiency, and innovation. Our remit spans the full lifecycle—from strategy and architecture through implementation and operations—driving modernization across the entire infrastructure stack. We collaborate across the ecosystem to harness emerging technologies, fuel growth, and transform industries. In this rapidly growing market, our team is leading the way in shaping how enterprises leverage AI and HPC to drive breakthrough innovation and reimagine what’s possible in infrastructure.

Key Responsibilities:

  • Design and implement HPC and AI infrastructure solutions, aligning system architecture and deployment roadmaps to industry-specific performance and scalability needs
  • Deploy, configure, and manage XPU-based clusters (CPU/GPU/accelerators) using schedulers, VM/K8s orchestration platforms, Slurm, and containerized platforms in scalable designs to provide Metal as a Service (MaaS), GPUaaS, AIaaS, and other offerings
  • Optimize cluster performance, scalability, energy, and cost efficiency across on-premises, cloud, and hybrid environments
  • Integrate AI and HPC platforms with existing IT systems, data pipelines, and security frameworks
  • Monitor, troubleshoot, and tune infrastructure to ensure high availability, low-latency networking, and workload resiliency
  • Develop and maintain documentation including architecture diagrams, configuration baselines, and operational runbooks
  • Provide technical guidance and support to users, enabling efficient execution of HPC/AI workloads, large-scale models, and simulations.

Travel may be required for this role. The amount of travel will vary from 25% to 100% depending on business need and client requirements.

Required Skills and Qualifications:

  • Minimum 4+ year of hands-on experience designing, deploying, and managing HPC and AI infrastructure across on-premises, cloud, and hybrid environments in 2 or more segments: hyperscaler, neocloud, large Enterprise, Telco/Mobile, supporting key industries such as Financial Services, Life Sciences, Manufacturing, and Retail
  • Minimum 4+ years’ experience of accelerated computing architectures (GPUs, XPUs, DPUs), high-performance fabrics (InfiniBand, Ethernet), SONiC, networking, and modern storage/data platforms (e.g. NVMe-oF, Lustre, GPFS, BeeGFS, VAST, DDN, Weka) to build robust solutions
  • Minimum 4+ year experience with cluster management and orchestration (e.g. Slurm, Run:ai, Kubernetes, Docker), real-time performance monitoring, and observability frameworks
  • Minimum 4+ years’ experience with cloud and virtualization platforms (e.g. AWS, Azure, GCP, VMware, Nutanix) and expertise in automation and optimization using scripting (Python, AI tools) with foundational Infrastructure-as-Code tools such as Terraform and Ansible.
  • Minimum 4+ year experience implementing MLOps and DevSecOps frameworks to enable secure, automated, and reproducible workflows
  • Bachelor's degree or equivalent (minimum 12 years) work experience. (If Associate’s Degree, must have minimum 6 years work experience)

Preferred Skills and Qualifications:

  • Experience managing the deployment of 1,000+ GPU clusters for HPC and AI workloads with various infrastructure services enabled
  • Experience with GPU computing libraries and accelerators (e.g., NVIDIA CUDA, Dynamo, AMD ROCm).
  • Experience with AI and HPC Networking (e.g., RoCE, InfiniBand, muti-planar/multi-rail designs, platform buffer architectures)
  • Knowledge of Machine Learning and AI frameworks (e.g., TensorFlow, PyTorch, JAX), Jupyter notebooks / Google Colab environments
  • Experience with HPC & AI workload management and optimization techniques
  • Familiarity with DevOps practices and tools (e.g., Ansible, Terraform) for infrastructure automation
  • Industry certifications in NVIDIA infrastructure, public cloud providers, Data Science, etc. are a plus

Compensation at Accenture varies depending on a wide array of factors, which may include but are not limited to the specific office location, role, skill set, and level of experience. As required by local law, Accenture provides a reasonable range of compensation for roles that may be hired as set forth below.

We anticipate this job posting will be posted on 01/27/2026 and open for at least 3 days.

Accenture offers a market competitive suite of benefits including medical, dental, vision, life, and long-term disability coverage, a 401(k) plan, bonus opportunities, paid holidays, and paid time off. See more information on our benefits here:

Want AI-powered job matching?

Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.

Get Started Free