Senior AI Voice Systems Engineer (WebRTC, GPU, Real-Time Voice, 1000+ Concurrent Calls)

Full Timesenior

Pimpri-Chinchwad, Maharashtra, INPosted March 21, 2026

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonRustRedisKafkaAPISDK

Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score

Job Description

Job Title

Senior AI Voice Systems Engineer (WebRTC, GPU, Real-Time Voice, 1000+ Concurrent Calls)

Overview

We are building a self-hosted AI Voice Agent platform for recruitment.

System requirements:

1000+ concurrent AI conversations
3000+ outbound calls (Twilio/Telnyx)
Real-time voice, low latency, high quality

You will review, fix, and scale our existing system.

This is a systems + real-time media engineering role. Not a prompt/API role.

Core Responsibilities

MOST OF THE CODING DONE BY CLAUDE CODE

Audit current architecture and identify bottlenecks
Redesign for low latency and high concurrency
Build call orchestration and throttling system
Optimize GPU usage across multiple nodes
Ensure:
STT/TTS/LLM runs only after human answers
Stable performance under burst traffic
Improve:
Latency
Voice quality
Turn-taking and interruptions

Core Requirements (MANDATORY)

WebRTC

Strong experience with:
RTP, ICE, STUN/TURN
Real-time audio streaming
Experience working with:
LiveKit
Daily
Must understand internal architecture, not just SDK usage

Telephony

Twilio or Telnyx programmable voice
SIP call flows and call state handling
PSTN ↔ WebRTC bridging

Voice AI Stack

VAD, AMD
Turn detection (endpointing, barge-in, silence handling)
Streaming STT/TTS pipelines
Full-duplex conversational systems

Audio Processing

Voice isolation and noise reduction (real-time)
Experience comparable to:
Krisp level quality
Echo cancellation, gain control, PSTN-quality audio handling

Modeling & Inference

Fine-tuning STT and/or TTS models (required)
Experience running models locally (no API reliance)
ONNX / ONNX Runtime (required)
GPU inference optimization:
CUDA
batching vs streaming
latency tuning

Concurrency & Systems

Experience handling:
1000+ concurrent sessions
WebSocket streaming at scale
Strong understanding of:
backpressure
queue systems
event-driven architecture

Programming

Python (async, multiprocessing, GPU integration) OR
Rust (preferred for performance systems)

Infrastructure

Fully self-hosted (no paid STT/TTS APIs)
GPU setup:
8× RTX 5090
8× H200

You must be comfortable working directly with GPU workloads.

Critical Capability

You must demonstrate:

Triggering AI pipeline only after human answer detection
GPU-aware call throttling
Scaling to 1000+ concurrent conversations
Maintaining low latency under load

Nice to Have

FreeSWITCH / Asterisk
Kafka / NATS / Redis streams
SFU/MCU architecture experience

Deliverables

Architecture review (issues + fixes)
Scalable system design
Optimized orchestration and GPU usage
Production-ready implementation

All jobs at JobTalk AI →

Want AI-powered job matching?

Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.

Get Started Free