Skip to main content
POWER COZMO logo

Site Reliability Engineer AI & Data Platforms (LLM & Kubernetes)

POWER COZMO
Full Timemid
New Delhi, Delhi, INPosted March 12, 2026

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonBashDockerKubernetesLinuxGitRESTKafkaMicroservices

Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score

Job Description

As an AI Support Engineer at Power Cozmo (India & Jordan), based in Amman, Jordan, you will play a crucial role in maintaining and supporting Kubernetes clusters for our production AI and data systems. Your responsibilities will involve a range of tasks related to infrastructure support, local LLM platform maintenance, data scraping, system monitoring, incident support, and automation.

Key Responsibilities:

  • **Kubernetes & Infrastructure Support**
  • Deploy, manage, and support production-grade Kubernetes clusters
  • Troubleshoot pod, node, networking, and storage issues in live environments
  • Manage Helm charts, ConfigMaps, Secrets, and Kubernetes manifests
  • Manage infrastructure and application deployments via Git repositories
  • **Local LLM & AI Platform Support**
  • Support local LLM deployments (Ollama, Mistral, LLaVA, Qwen, etc.)
  • Troubleshoot inference performance, memory issues, and model loading failures
  • Support AI services exposed via REST APIs or internal microservices
  • **Data Scraping & Crawling**
  • Support and maintain web scraping and crawling pipelines
  • Debug data extraction failures, rate limits, and anti-bot challenges
  • Ensure data quality, consistency, and pipeline reliability
  • Assist in scheduling and monitoring crawlers (cron jobs)
  • **System Monitoring & Incident Support**
  • Perform root cause analysis for production incidents
  • Monitor logs, metrics, and alerts using tools like Prometheus, Grafana
  • Maintain uptime and SLAs for AI and data platforms
  • Participate in on-call or escalation support rotations
  • **Automation**
  • Maintain documentation for support procedures and known issues
  • Collaborate with engineering teams for fixes and optimizations

Required Skills & Qualifications:

  • 5+ years experience as a Data / AI Support Engineer
  • Hands-on experience with Docker and containerized workloads
  • Experience supporting local LLM models or AI inference systems
  • Practical knowledge of Linux system administration
  • Experience with data scraping or web crawling systems
  • Basic scripting skills (Python, Bash)

Preferred / Good to Have:

  • Familiarity with Neo4j, Kafka, or data pipelines
  • Experience with Ollama, Hugging Face models, or similar LLM runtimes
  • Knowledge of cloud-native monitoring tools
  • Understanding of networking concepts (DNS, ingress, load balancers)

In this role, you will have the opportunity to work on cutting-edge AI and LLM infrastructure both over the cloud and local AI servers in a fast-paced startup or scale-up environment, offering significant career growth opportunities in the field of AI. As an AI Support Engineer at Power Cozmo (India & Jordan), based in Amman, Jordan, you will play a crucial role in maintaining and supporting Kubernetes clusters for our production AI and data systems. Your responsibilities will involve a range of tasks related to infrastructure support, local LLM platform maintenance, data scraping, system monitoring, incident support, and automation.

Key Responsibilities:

  • **Kubernetes & Infrastructure Support**
  • Deploy, manage, and support production-grade Kubernetes clusters
  • Troubleshoot pod, node, networking, and storage issues in live environments
  • Manage Helm charts, ConfigMaps, Secrets, and Kubernetes manifests
  • Manage infrastructure and application deployments via Git repositories
  • **Local LLM & AI Platform Support**
  • Support local LLM deployments (Ollama, Mistral, LLaVA, Qwen, etc.)
  • Troubleshoot inference performance, memory issues, and model loading failures
  • Support AI services exposed via REST APIs or internal microservices
  • **Data Scraping & Crawling**
  • Support and maintain web scraping and crawling pipelines
  • Debug data extraction failures, rate limits, and anti-bot challenges
  • Ensure data quality, consistency, and pipeline reliability
  • Assist in scheduling and monitoring crawlers (cron jobs)
  • **System Monitoring & Incident Support**
  • Perform root cause analysis for production incidents
  • Monitor logs, metrics, and alerts using tools like Prometheus, Grafana
  • Maintain uptime and SLAs for AI and data platforms
  • Participate in on-call or escalation support rotations
  • **Automation**
  • Maintain documentation for support procedures and known issues
  • Collaborate with engineering teams for fixes and optimizations

Required Skills & Qualifications:

  • 5+ years experience as a Data / AI Support Engineer
  • Hands-on experience with Docker and containerized workloads
  • Experience supporting local LLM models or AI inference systems
  • Practical knowledge of Linux system administration
  • Experience with data scraping or web crawling systems
  • Basic scripting skills (Python, Bash)

Preferred / Good to Have:

  • Familiarity with Neo4j, Kafka, or data pipelines
  • Experience with Ollama, Hugging Face models, or similar LLM runtimes
  • Knowledge of cloud-native monitoring tools
  • Understanding of networking concepts (DNS, ingress, load balancers)

Want AI-powered job matching?

Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.

Get Started Free