Senior AI Voice Systems Engineer (WebRTC, GPU, Real-Time Voice, 1000+ Concurrent Calls)
JobTalk AIResume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score
Job Description
Job Title
Senior AI Voice Systems Engineer (WebRTC, GPU, Real-Time Voice, 1000+ Concurrent Calls)
Overview
We are building a self-hosted AI Voice Agent platform for recruitment.
System requirements:
- 1000+ concurrent AI conversations
- 3000+ outbound calls (Twilio/Telnyx)
- Real-time voice, low latency, high quality
You will review, fix, and scale our existing system.
This is a systems + real-time media engineering role. Not a prompt/API role.
Core Responsibilities
MOST OF THE CODING DONE BY CLAUDE CODE
- Audit current architecture and identify bottlenecks
- Redesign for low latency and high concurrency
- Build call orchestration and throttling system
- Optimize GPU usage across multiple nodes
- Ensure:
- STT/TTS/LLM runs only after human answers
- Stable performance under burst traffic
- Improve:
- Latency
- Voice quality
- Turn-taking and interruptions
Core Requirements (MANDATORY)
WebRTC
- Strong experience with:
- RTP, ICE, STUN/TURN
- Real-time audio streaming
- Experience working with:
- LiveKit
- Daily
- Must understand internal architecture, not just SDK usage
Telephony
- Twilio or Telnyx programmable voice
- SIP call flows and call state handling
- PSTN ↔ WebRTC bridging
Voice AI Stack
- VAD, AMD
- Turn detection (endpointing, barge-in, silence handling)
- Streaming STT/TTS pipelines
- Full-duplex conversational systems
Audio Processing
- Voice isolation and noise reduction (real-time)
- Experience comparable to:
- Krisp level quality
- Echo cancellation, gain control, PSTN-quality audio handling
Modeling & Inference
- Fine-tuning STT and/or TTS models (required)
- Experience running models locally (no API reliance)
- ONNX / ONNX Runtime (required)
- GPU inference optimization:
- CUDA
- batching vs streaming
- latency tuning
Concurrency & Systems
- Experience handling:
- 1000+ concurrent sessions
- WebSocket streaming at scale
- Strong understanding of:
- backpressure
- queue systems
- event-driven architecture
Programming
- Python (async, multiprocessing, GPU integration) OR
- Rust (preferred for performance systems)
Infrastructure
- Fully self-hosted (no paid STT/TTS APIs)
- GPU setup:
- 8× RTX 5090
- 8× H200
You must be comfortable working directly with GPU workloads.
Critical Capability
You must demonstrate:
- Triggering AI pipeline only after human answer detection
- GPU-aware call throttling
- Scaling to 1000+ concurrent conversations
- Maintaining low latency under load
Nice to Have
- FreeSWITCH / Asterisk
- Kafka / NATS / Redis streams
- SFU/MCU architecture experience
Deliverables
- Architecture review (issues + fixes)
- Scalable system design
- Optimized orchestration and GPU usage
- Production-ready implementation
Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free