Description : A leading consulting firm operating in the Enterprise Generative AI and Large Language Model (LLM) services sector, delivering production-grade LLM solutions, retrieval-augmented systems, and custom generative AI products for enterprise clients across domains. The team focuses on building secure, scalable, low-latency inference services and automating model lifecycle workflows for on-prem and cloud deployments.Position : LLM Engineer - On-site (India). We are hiring an experienced LLM engineer to design, fine-tune, and deploy LLM-based solutions that power search, summarization, agents, and domain-specific assistants.Role & Responsibilities : - Design, fine-tune, and validate LLMs for production use-cases : Instruction tuning, supervised fine-tuning, and parameter-efficient tuning (LoRA/adapters)- Implement retrieval-augmented generation (RAG) pipelines : Embeddings, vector search, chunking, and context assembly for high-recall responses.- Optimize inference for latency and cost : Quantization, model pruning, batching, and deployment with optimized runtimes (CUDA, Triton, bitsandbytes where applicable).- Build backend services and APIs to serve LLM inference and orchestration using containerized deployments (Docker/Kubernetes) and CI/CD pipelines.- Collaborate with product, data engineering, and ML teams to integrate LLMs into production flows, monitor model performance, and set up automated retraining/rollbacks.- Create reproducible training pipelines, implement evaluation suites, and produce documentation and runbooks for model governance and observability.Skills & Qualifications : Must-Have : - 4+ years of hands-on experience working with LLMs or advanced NLP models in production contexts.- Proficiency in Python for ML engineering and model development.- Experience with PyTorch and Hugging Face Transformers for training and fine-tuning.- Practical experience implementing RAG and vector search using tools like FAISS or similar vector databases.- Familiarity with LangChain (or equivalent orchestration) and integration with LLM APIs (OpenAI, Anthropic, etc.).- Experience containerizing and deploying ML services using Docker; familiarity with Kubernetes is a plus.Preferred : - Experience with inference optimizations : quantization (bitsandbytes), Triton, or GPU accelerated serving.- Exposure to distributed training frameworks (DeepSpeed) and cloud MLOps platforms (SageMaker, Azure ML, GCP AI Platform).- Knowledge of monitoring, logging, and model-evaluation frameworks for production LLMs (MLflow, Prometheus, Grafana).Benefits & Culture Highlights : - Collaborative, engineering-driven culture with strong focus on ownership and rapid iteration.- Opportunity to build end-to-end LLM products for enterprise clients and influence architecture decisions.- On-site role with hands-on access to GPU infrastructure and cross-functional product teams.Skills : pytorch, cuda, docker, python, agentic, llm (ref: hirist.tech)

LLM Engineer - Machine Learning

Resume Keywords to Include

Job Description

More Jobs at Zorba Consulting India Pvt. Ltd.

Want AI-powered job matching?

More Jobs at Zorba Consulting India Pvt. Ltd.