Job Description
About the Role
We are looking for a hands-on Applied Machine Learning + Generative AI Engineer to drive experimentation and innovation within our ML team. This role will focus on exploring the evolving GenAI landscape (LLMs, RAG, agents, multimodal models) and translating those advancements into practical tools, frameworks, and best practices that improve team productivity and model quality.
Role Charter: End-to-End Ownership
MLE will own the full ML production lifecycle, including:
- Model deployment & ML inference (offline + real-time)
- Feature pipelines(Offline, and realtime pipelines with retina) & feature store adoption
- Observability, alerting, and cost monitoring
- Reliability, latency, and pipeline uptime
- Enabling MLS productivity through tooling, frameworks, and best practices
- Clear ownership with MLE as the single point of contact between Data Science and ML Platform
- First-class ML observability covering data, model, infra, cost, and latency metrics
- Real-time signals ingestion with latency SLAs
Gen-AI Goals
- Accelerate development via AI-assisted coding
- Auto-generation of boilerplate pipelines & monitoring configs
- Intelligent alerts & root-cause hints for production issues
- Track and evaluate advancements in LLMs, RAG, and agent frameworks
- Prototype and build internal GenAI tools to improve:
- Experimentation workflows
- Code and documentation automation
- Design and implement RAG-based systems
- Establish best practices for prompt engineering, evaluation, and cost optimization
- Conduct internal knowledge-sharing sessions to uplift team capability
MLE Key Result Areas (KRAs)
Operational Excellence
- Improve Mean Time to Detect (MTTD) production issues
- Improve Mean Time to Respond (MTTR) (co-owned with MLSs)
- Own pipeline uptime and reliability SLAs
- MLE POD to publish OE metrics for leadership connects
Cost & Efficiency
- Reduce operational cost by X% across PODs
- Continuous cost monitoring for Ads, Reco, and Personalization
- Optimize existing production pipelines
Productivity Improvements
- Build & evangelize utility libraries (DS-MLP)
- Drive adoption of Gen-AI tools for code completion & development
- Assist MLSs with production deployments on the inference engine
- Perform code reviews with MLSs
- Create frameworks for latency evaluation of real-time models
Observability & Alerting
- Dashboards for ML-Ops metrics (Kibana/Grafana)
- Alerts on coverage metrics & regression tests
- Automated alerts to Hangout channels (e.g., Ads dashboards)
Platform & Horizontal Initiatives
- Continuous feature development on ML Inference Engine along with ML Inference platform team.
- One-point ownership between DS teams and ML Inference platform
- Offline Feature Store for model training & inference
🛠 What We’re Looking For
- 3–6 years of experience in ML / AI
- Strong Python skills
- Hands-on experience with LLM APIs (OpenAI / Anthropic / Gemini / OSS models)
- Experience building RAG pipelines
- Familiarity with vector databases (FAISS, Pinecone, etc.)
- Curiosity-driven, experimentation mindset
Similar Jobs
Systems Administrator (LINUX)
Nightwing Intelligence Solutions, LLC
Software Dev Engineer II, GMT Supplier Management and PO Lifecycle Tech
ADCI - Karnataka
Software Engineer II - Python, Databricks, Bigdata
JPMorganChase
Staff Analytics Engineer
Intrado
Developer Sr - Web
Western Financial Group
Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free