Role Overview

We are seeking an experienced Machine Learning Engineer specializing in Large Language Models (LLMs) to design, develop, and deploy advanced AI systems operating at production scale. In this role, you will build intelligent systems capable of delivering millions of predictions daily while maintaining strict standards for performance, scalability, latency, and cost efficiency.

You will work across the entire machine learning lifecycle—from data ingestion and experimentation to production deployment and continuous optimization—helping drive the development of highly reliable and performant AI-powered solutions.

This role requires strong expertise in LLM architecture, model adaptation techniques, MLOps, and Retrieval-Augmented Generation (RAG) systems, along with hands-on experience building end-to-end machine learning pipelines.

Key Responsibilities

Machine Learning & Model Development

Design, implement, and deploy advanced machine learning models and large language models that operate at large scale while maintaining high levels of accuracy and responsiveness.
Optimize models to support high-throughput prediction workloads, ensuring minimal latency and efficient resource utilization.
Continuously refine models through experimentation, evaluation, and iterative improvements.

End-to-End ML Pipeline Development

Develop and maintain complete ML pipelines that support data ingestion, preprocessing, feature engineering, model training, validation, and production deployment.
Implement automated workflows supporting OCR processing, data extraction, and transformation pipelines that feed downstream machine learning systems.
Ensure production pipelines are resilient, scalable, and capable of supporting evolving model requirements.

LLM Fine-Tuning and Adaptation

Build and manage fine-tuning workflows for large language models, applying various adaptation techniques to optimize model performance for specific use cases.
Implement approaches including:
* Full model fine-tuning
LoRA (Low-Rank Adaptation)
Prompt engineering and prompt-based optimization
Direct Preference Optimization (DPO) and other alignment techniques
Evaluate the effectiveness of different adaptation methods to determine the most efficient approach for each application.

Model Architecture and Optimization

Explore and experiment with new LLM architecture designs, balancing performance, computational overhead, memory usage, and deployment constraints.
Improve inference performance through techniques such as:
* Model quantization
Model compression
Knowledge distillation and teacher–student model architectures
Enable deployment of models in environments with limited compute resources while maintaining strong performance.

Retrieval-Augmented Generation (RAG) Systems

Design and deploy RAG-based AI systems that combine generative models with retrieval capabilities.
Implement semantic search pipelines using:
* Vector databases
Embedding generation
Document chunking strategies
Indexing and retrieval workflows
Utilize frameworks and platforms such as LangChain, LlamaIndex, and cloud-native RAG solutions within environments including Google Cloud Platform and Databricks.

ML Evaluation and Continuous Improvement

Build automated evaluation systems to assess model quality, accuracy, and reliability.
Implement pipelines that support:
* Automated ground truth generation
Continuous evaluation and benchmarking
Feedback loops that improve models over time
Drive ongoing optimization by analyzing production performance and retraining models as needed.

MLOps and Production Systems

Design and implement CI/CD pipelines for machine learning workloads, enabling rapid iteration and reliable deployments.
Ensure production systems maintain:
* High availability
Strong reliability
Scalable performance
Support monitoring, logging, and alerting systems to track model health and performance.

Required Qualifications

5+ years of professional experience in machine learning engineering, with hands-on experience deploying ML or NLP systems in production environments.
Demonstrated success developing and operating large-scale ML systems that support high-volume workloads.
Extensive experience building end-to-end ML solutions, including data pipelines, model training workflows, evaluation frameworks, and deployment processes.
Deep expertise working with large language models and model adaptation techniques, including full fine-tuning, LoRA, prompt-based optimization, and preference-based training methods such as DPO.
Practical experience designing and optimizing LLM architectures while balancing performance requirements with compute and memory constraints.
Strong knowledge of model optimization strategies including quantization, compression, and knowledge distillation for efficient inference.
Hands-on experience designing and implementing RAG architectures, including vector search, embeddings, semantic retrieval, and document indexing strategies.
Familiarity with modern LLM orchestration frameworks such as LangChain and LlamaIndex.
Experience working within cloud-based AI platforms, particularly environments such as Google Cloud Platform and Databricks.
Solid understanding of MLOps practices, including automated model evaluation, CI/CD pipelines for ML, and monitoring of production models.
Strong programming skills in Python and experience using modern machine learning and deep learning frameworks.

Company DescriptionWAVSYS is a national solutions company offering contract, permanent and turnkey staffing solutions by leveraging its international network of 20 offices covering USA, Canada, and the UK.