GCP Cloud & VertexAI Infrastructure Engineer
VARITE INDIA PRIVATE LIMITEDResume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score
Job Description
Company Name: VARITE India Private Limited
About The Client:
A global IT services and consulting company, multinational information technology (IT), headquartered in Tokyo, Japan. The Client offers a wide array of IT services, including application development, infrastructure management, and business process outsourcing. Their consulting services span business and technology, while their digital solutions focus on transformation and user experience design. It excels in data and intelligence services, emphasizing analytics, AI, and machine learning. Additionally, their cybersecurity, cloud, and application services round out a comprehensive portfolio designed to meet the diverse needs of businesses worldwide.
About The Job:
- We are seeking a highly skilled GCP Infrastructure Engineer to design, build, and manage the cloud infrastructure that powers Generative AI (GenAI) applications at scale. In this role, you will leverage Google Cloud Platform (GCP) Vertex AI, Client Watsonx, and containerization technologies such as Docker and Kubernetes (GKE) to deliver secure, scalable, and high-performance AI solutions.
- You will own the end-to-end infrastructure lifecycle — from design and provisioning to automation, monitoring, and optimization — while enabling data scientists and ML engineers to seamlessly deploy and operate GenAI workloads.
Essential Job Functions:
- Cloud Infrastructure & Platform Engineering
- Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP.
- Deploy and manage containerized workloads using Docker and Kubernetes (GKE).
- Configure and optimize Vertex AI and Client Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models.
- Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference.
- Ensure business continuity through backup, disaster recovery, and multi-region deployments.
- Automation & Reliability
- Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager.
- Adopt GitOps practices (Flux) for infrastructure lifecycle management.
- Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications.
- Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime.
- Security, Governance & Compliance
- Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management.
- Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP).
- Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms.
- Monitoring, Observability & Cost Optimization
- Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies).
- Define KPIs to monitor system health, performance, and adoption across AI workloads.
- Optimize cloud cost efficiency for GPU/TPU-intensive workloads using autoscaling, preemptible instances, and utilization monitoring.
- Collaboration & Enablement Partner with data scientists, ML engineers, and software teams to streamline GenAI application development and deployment.
- Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure.
- Stay current with the latest advancements in GenAI, cloud-native infrastructure, and container orchestration.
Qualifications
- Experience :- 6+ Year Relevant experience in GCP Cloud
MUST Have Mandatory skills (non-negotiable)
- Minimum 6 Year Relevant in GCP Cloud / Infrastructure & Kubernetes (GKE)
- Must Have experience with providing GCP Support for GenAI / AI project, Specially VertexAI from google
- Handson experience DevOps & Platform Engineering Specifically Terraform through IaC
- Handson experience with CI/CD (GIT, GitLab, Jenkins)
- Any experience with Client WatsonX will be a major PLUS
Required Education
- Bachelor’s or Master’s degree in computer science, Software Engineering, or a related field.
Required Skills & Experience
- 6+ years of experience in cloud infrastructure engineering, DevOps, or platform engineering.
- Experience with GenAI use cases (chatbots, content generation, code assistants, etc.).
- Strong hands-on expertise with Google Cloud Platform (GCP), especially Vertex AI.
- Experience with Client WatsonX for AI application deployment and management.
- Proven skills in Docker, Kubernetes (GKE), and container orchestration at scale.
- Proficiency in Python, Bash, or other relevant scripting languages.
- Strong understanding of cloud networking, IAM, and security best practices.
- Experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins) and IaC tools (Terraform, Pulumi, Ansible, Deployment Manager).
- Familiarity with data pipelines and integration tools (Dataflow, Apache Beam, Pub/Sub, Kafka).
- Excellent problem-solving, debugging, and communication skills.
Preferred Experience
- Experience in MLOps practices for model deployment, monitoring, and retraining.
- Exposure to multi-cloud or hybrid cloud environments (GCP, AWS, Azure, on-prem).
- Hands-on experience with feature stores (Vertex AI Feature Store, Feast) and ML observability tools (EvidentlyAI, Fiddler).
- Knowledge of distributed training frameworks (Horovod, DeepSpeed, PyTorch Distributed).
- Contributions to open-source projects in infrastructure, MLOps, or GenAI.
- Experience managing infrastructure in regulated industries.
Preferred Certifications:
- Google Cloud Certified - Professional Cloud Architect
- Google Cloud Certified - Machine Learning Engineer
- Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD)
- Client Certified Watsonx Generative AI Engineer – Associate
- Client Certified Solution Architect - Cloud Pak for Data
- Other relevant certifications in AI, Machine Learning, or Cloud-Native technologies
How to Apply: Interested candidates are encouraged to respond/submit their updated resumes, and for additional job opportunities, please visit Jobs In India – VARITE.
Unlock Rewards: Refer Candidates and Earn.
If you're not available or interested in this opportunity, please pass this along to anyone in your network who might be a good fit and interested in our open positions. VARITE offers a Candidate Referral program, where you'll receive a one-time referral bonus based on the following scale if the preferred candidate completes a three-month assignment with VARITE.
Experience Level Bonus Referral: 0-2 years
INR 5,000
2-6 years
INR 7,500
6+ years
INR 10,000
About VARITE: VARITE is a global staffing and IT consulting company providing technical consulting and team augmentation services to Fortune 500 Companies in USA, UK, CANADA and INDIA. VARITE is currently a primary and direct vendor to the leading corporations in the verticals of Networking, Cloud Infrastructure, Hardware and Software, Digital Marketing and Media Solutions, Clinical Diagnostics, Utilities, Gaming and Entertainment, and Financial Services.
Equal Opportunity Employer:
VARITE is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate based on race, color, religion, sex, sexual orientation, gender identity or expression, national origin, age, marital status, veteran status, or disability status
Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free