AI Benchmark Engineer (Knowledge/Research)

Full Timemid

Mumbai, Maharashtra, INPosted 9 days ago

Role Overview

Turing is hiring a mid-level AI Benchmark Engineer (Knowledge/Research). This is a full-time role in Mumbai. posted last week. Full responsibilities, required qualifications, and the apply link are listed in the description below.

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonDockerOracleP&LORTuringBasedSan

Job Description

About Turing:

Based in San Francisco, California, Turing is the world's leading research accelerator for frontier AI labs and a trusted partner for global enterprises deploying advanced AI systems. Turing supports customers in two ways: first, by accelerating frontier research with high-quality data, advanced training pipelines, plus top AI researchers who specialize in coding, reasoning, STEM, multilinguality, multimodality, and agents; and second, by applying that expertise to help enterprises transform AI from proof of concept into proprietary intelligence with systems that perform reliably, deliver measurable impact, and drive lasting results on the P&L.

Role Overview:

We are seeking a highly analytical and computationally proficient individual to join our team with a strong research background. You will be instrumental in contributing to this role by either crafting challenging and insightful problems in your respective research domain, devising elegant computational solutions.

Responsibilities

Build multi-agent benchmark tasks that require reading, analyzing, and synthesizing large document collections
Curate real-world research corpora — academic papers, case studies, technical reports — and design questions that require comprehensive analysis
Write structured ground-truth oracles (JSON) with specific, verifiable answers that prove the agent actually read the source material
Design LLM judge prompts that evaluate agent output field-by-field against the oracle
Create decomposition guides that split research across multiple parallel sub-agents (one per document, one per domain, then synthesis)

Required Qualifications:

5+ years of research experience (academic or industry) in any scientific domain
Strong reading comprehension with ability to extract structured data from unstructured text
Experience with JSON and data structures, including schema design and output validation
Proficiency in Python scripting for data processing and evaluation (e.g., judge scripts)
Familiarity with AI coding benchmarks such as SWE-bench and Terminal-bench
Hands-on experience with Docker (writing Dockerfiles, building images, debugging containers)
High attention to detail, especially for creating precise evaluation oracles without approximations

Nice to have

Experience with systematic reviews, meta-analyses, or large-scale literature surveys
Familiarity with medical, legal, or scientific document analysis
Experience with NLP or information extraction tasks
Knowledge of LLM evaluation and benchmarking (e.g., MMLU, GPQA, SimpleQA)
Experience curating datasets for AI evaluation

Perks of Freelancing With Turing:

Work in a fully remote environment.
Opportunity to work on cutting-edge AI projects with leading LLM companies.
Potential for contract extension based on performance and project needs.

Offer Details:

Commitments Required: 40 hours /week with 4 hours of PST Overlap
Engagement type: Contractor assignment/freelancer (no medical/paid leave)
Duration of contract: 1 month; [expected start date is next week]

Frequently Asked Questions

How do I apply for the AI Benchmark Engineer (Knowledge/Research) position at Turing?

Use the Apply button above to submit your application directly to Turing. Most applications take less than 5 minutes if your resume and contact details are ready, and you'll be routed to the employer's official application system to finish.

Where is the AI Benchmark Engineer (Knowledge/Research) position at Turing located?

This position is based in Mumbai. Turing has not indicated remote or hybrid options for this role, so candidates should plan for on-site work.

What does a AI Benchmark Engineer (Knowledge/Research) at Turing earn?

Turing has not disclosed a salary range in this posting. Many employers share specifics later in the interview process; you can also ask during a recruiter screen if compensation transparency is important to you.

When was the AI Benchmark Engineer (Knowledge/Research) role at Turing posted?

This role was posted on May 30, 2026 (9 days ago). It's still listed as actively hiring; we re-confirm openings against the source system multiple times per day and remove closed roles.

AI-powered job search

Get every job scored to your resume

Upload your resume and get jobs ranked, your resume tailored, and employee contacts found automatically.

Get Started Free

No credit card to start