Resume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score
Job Description
JOB DESCRIPTION
Senior Data Scientist
Speech, Voice & Conversational AI
Apply Now
Department:
Data Science & AI
Experience
12 – 15 Years
Role Overview
We are seeking a highly experienced Senior Data Scientist – Speech, Voice & Conversational AI to lead the architecture, design, and delivery of next-generation voice and speech AI solutions. This role sits at the intersection of deep machine learning expertise and practical product engineering, driving end-to-end voice AI capabilities across Firstsource’s service lines.
The ideal candidate brings 12–15 years of progressive experience in data science with a strong specialization in speech and voice technologies, along with hands-on expertise in Generative AI, Agentic AI frameworks, and modern voice pipeline tooling. You will act as a technical thought leader, shaping our voice AI strategy, mentoring teams, and collaborating with cross-functional stakeholders to deliver production-grade solutions at scale.
Key Responsibilities
Voice & Speech AI Architecture
- Design and own the end-to-end architecture for voice AI solutions including real-time speech-to-text (STT), text-to-speech (TTS), voice-to-voice, speaker diarization, emotion detection, and voice biometrics.
- Evaluate, benchmark, and integrate leading speech platforms and APIs such as Google Cloud Speech, Amazon Transcribe, Azure Speech Services, Whisper (OpenAI), Deepgram, AssemblyAI, ElevenLabs, and PlayHT.
- Build robust voice pipelines that handle noise cancellation, language identification, accent adaptation, and real-time streaming at production scale.
Generative AI & Agentic AI
- Architect and deploy GenAI-powered conversational agents leveraging Large Language Models (LLMs) such as GPT-4, Claude, Gemini, and open-source alternatives (LLaMA, Mistral).
- Design Agentic AI workflows using frameworks such as LangChain, LangGraph, CrewAI, AutoGen, and Semantic Kernel to build multi-step, tool-using voice agents.
- Implement Retrieval-Augmented Generation (RAG) pipelines with vector databases (Pinecone, Weaviate, Qdrant, Chroma) for context-aware voice assistants.
- Drive prompt engineering strategies and fine-tuning approaches (LoRA, QLoRA, RLHF) to optimize LLM performance for speech-centric use cases.
Solution Design & Delivery
- Lead solution design workshops with clients and internal stakeholders to translate business requirements into scalable voice AI architectures.
- Define technical roadmaps, establish best practices, and create reusable solution accelerators for voice and conversational AI.
- Own proof-of-concept (POC) development through to production deployment, working closely with MLOps and engineering teams.
Leadership & Mentoring
- Mentor and upskill a team of data scientists and ML engineers on speech AI and GenAI best practices.
- Represent Firstsource as a subject-matter expert in voice AI at internal reviews, client presentations, and industry forums.
- Stay current on rapidly evolving GenAI, speech, and agentic AI research and translate insights into actionable opportunities.
Technical Skills & Tooling
Domain
Required Proficiency
- Speech-to-Text (STT)
- Whisper, Google Cloud Speech, Azure Speech, Amazon Transcribe, Deepgram, AssemblyAI, Kaldi
- Text-to-Speech (TTS)
- ElevenLabs, PlayHT, Azure Neural TTS, Amazon Polly, Google WaveNet, Tortoise TTS, Bark
- Voice-to-Voice
- Real-time duplex pipelines, WebRTC integration, voice cloning, prosody transfer, streaming architectures
- LLM & GenAI
- GPT-4/4o, Claude, Gemini, LLaMA, Mistral, fine-tuning (LoRA/QLoRA), RLHF, prompt engineering
- Agentic AI Frameworks
- LangChain, LangGraph, CrewAI, AutoGen, Semantic Kernel, function calling, tool-use patterns
- RAG & Vector DBs
- Pinecone, Weaviate, Qdrant, Chroma, FAISS, embedding models, hybrid search
- ML / Deep Learning
- PyTorch, TensorFlow, Transformers (HuggingFace), audio feature engineering (MFCCs, spectrograms)
- Cloud & MLOps
- AWS / Azure / GCP, Docker, Kubernetes, MLflow, model serving (Triton, TorchServe, vLLM)
- Programming
- Python (advanced), SQL, familiarity with Rust/C++ for performance-critical audio processing
- Telephony & Contact Center
- Twilio, Genesys, Amazon Connect, SIP/VoIP protocols, CCAI (Google Contact Center AI)
Qualifications & Experience
- 12–15 years of progressive experience in Data Science, Machine Learning, or AI Engineering, with at least 5 years focused on speech, voice, or audio ML
Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free