Site Reliability Engineer AI & Data Platforms (LLM & Kubernetes)

POWER COZMO

Full Timemid

New Delhi, Delhi, INPosted March 12, 2026

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonBashDockerKubernetesLinuxGitRESTKafkaMicroservices

Job Description

As an AI Support Engineer at Power Cozmo (India & Jordan), based in Amman, Jordan, you will play a crucial role in maintaining and supporting Kubernetes clusters for our production AI and data systems. Your responsibilities will involve a range of tasks related to infrastructure support, local LLM platform maintenance, data scraping, system monitoring, incident support, and automation.

Key Responsibilities:

**Kubernetes & Infrastructure Support**
Deploy, manage, and support production-grade Kubernetes clusters
Troubleshoot pod, node, networking, and storage issues in live environments
Manage Helm charts, ConfigMaps, Secrets, and Kubernetes manifests
Manage infrastructure and application deployments via Git repositories
**Local LLM & AI Platform Support**
Support local LLM deployments (Ollama, Mistral, LLaVA, Qwen, etc.)
Troubleshoot inference performance, memory issues, and model loading failures
Support AI services exposed via REST APIs or internal microservices
**Data Scraping & Crawling**
Support and maintain web scraping and crawling pipelines
Debug data extraction failures, rate limits, and anti-bot challenges
Ensure data quality, consistency, and pipeline reliability
Assist in scheduling and monitoring crawlers (cron jobs)
**System Monitoring & Incident Support**
Perform root cause analysis for production incidents
Monitor logs, metrics, and alerts using tools like Prometheus, Grafana
Maintain uptime and SLAs for AI and data platforms
Participate in on-call or escalation support rotations
**Automation**
Maintain documentation for support procedures and known issues
Collaborate with engineering teams for fixes and optimizations

Required Skills & Qualifications:

5+ years experience as a Data / AI Support Engineer
Hands-on experience with Docker and containerized workloads
Experience supporting local LLM models or AI inference systems
Practical knowledge of Linux system administration
Experience with data scraping or web crawling systems
Basic scripting skills (Python, Bash)

Preferred / Good to Have:

Familiarity with Neo4j, Kafka, or data pipelines
Experience with Ollama, Hugging Face models, or similar LLM runtimes
Knowledge of cloud-native monitoring tools
Understanding of networking concepts (DNS, ingress, load balancers)

In this role, you will have the opportunity to work on cutting-edge AI and LLM infrastructure both over the cloud and local AI servers in a fast-paced startup or scale-up environment, offering significant career growth opportunities in the field of AI. As an AI Support Engineer at Power Cozmo (India & Jordan), based in Amman, Jordan, you will play a crucial role in maintaining and supporting Kubernetes clusters for our production AI and data systems. Your responsibilities will involve a range of tasks related to infrastructure support, local LLM platform maintenance, data scraping, system monitoring, incident support, and automation.

Key Responsibilities: