Skip to main content
JobTalk AI logo

Senior AI Voice Systems Engineer (WebRTC, GPU, Real-Time Voice, 1000+ Concurrent Calls)

JobTalk AI
Full Timesenior
Pimpri-Chinchwad, Maharashtra, INPosted March 21, 2026

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonRustRedisKafkaAPISDK

Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score

Job Description

Job Title

Senior AI Voice Systems Engineer (WebRTC, GPU, Real-Time Voice, 1000+ Concurrent Calls)

Overview

We are building a self-hosted AI Voice Agent platform for recruitment.

System requirements:

  • 1000+ concurrent AI conversations
  • 3000+ outbound calls (Twilio/Telnyx)
  • Real-time voice, low latency, high quality

You will review, fix, and scale our existing system.

This is a systems + real-time media engineering role. Not a prompt/API role.

Core Responsibilities

MOST OF THE CODING DONE BY CLAUDE CODE

  • Audit current architecture and identify bottlenecks
  • Redesign for low latency and high concurrency
  • Build call orchestration and throttling system
  • Optimize GPU usage across multiple nodes
  • Ensure:
  • STT/TTS/LLM runs only after human answers
  • Stable performance under burst traffic
  • Improve:
  • Latency
  • Voice quality
  • Turn-taking and interruptions

Core Requirements (MANDATORY)

WebRTC

  • Strong experience with:
  • RTP, ICE, STUN/TURN
  • Real-time audio streaming
  • Experience working with:
  • LiveKit
  • Daily
  • Must understand internal architecture, not just SDK usage

Telephony

  • Twilio or Telnyx programmable voice
  • SIP call flows and call state handling
  • PSTN ↔ WebRTC bridging

Voice AI Stack

  • VAD, AMD
  • Turn detection (endpointing, barge-in, silence handling)
  • Streaming STT/TTS pipelines
  • Full-duplex conversational systems

Audio Processing

  • Voice isolation and noise reduction (real-time)
  • Experience comparable to:
  • Krisp level quality
  • Echo cancellation, gain control, PSTN-quality audio handling

Modeling & Inference

  • Fine-tuning STT and/or TTS models (required)
  • Experience running models locally (no API reliance)
  • ONNX / ONNX Runtime (required)
  • GPU inference optimization:
  • CUDA
  • batching vs streaming
  • latency tuning

Concurrency & Systems

  • Experience handling:
  • 1000+ concurrent sessions
  • WebSocket streaming at scale
  • Strong understanding of:
  • backpressure
  • queue systems
  • event-driven architecture

Programming

  • Python (async, multiprocessing, GPU integration) OR
  • Rust (preferred for performance systems)

Infrastructure

  • Fully self-hosted (no paid STT/TTS APIs)
  • GPU setup:
  • 8× RTX 5090
  • 8× H200

You must be comfortable working directly with GPU workloads.

Critical Capability

You must demonstrate:

  • Triggering AI pipeline only after human answer detection
  • GPU-aware call throttling
  • Scaling to 1000+ concurrent conversations
  • Maintaining low latency under load

Nice to Have

  • FreeSWITCH / Asterisk
  • Kafka / NATS / Redis streams
  • SFU/MCU architecture experience

Deliverables

  • Architecture review (issues + fixes)
  • Scalable system design
  • Optimized orchestration and GPU usage
  • Production-ready implementation

Want AI-powered job matching?

Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.

Get Started Free