Skip to main content
SparxIT logo

Senior Data Engineer - AI Product Validation Specialist

SparxIT
Full Timesenior
Noida, Uttar Pradesh, INPosted April 8, 2026

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonSQLAWSGCPAzureDockerKubernetesApacheSnowflakeKafkaSparkAirflowdbtCI/CD

Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score

Job Description

Location: Noida , On-Site

Experience: 6+Years

Employment Type: Full-Time

Company: ProactiveAI (by Sparx IT Solutions Pvt Ltd)

About the Role

We are building an AI product that aims to automate and replace the traditional human data engineer role. We are seeking a highly skilled Senior Data Engineer to serve as our AI Product Validation Specialist. In this role, you will act as the domain expert and "gold standard" benchmark, validating the accuracy and quality of our AI-generated data engineering outputs. Your expertise will directly shape how our AI product understands and executes complex data engineering tasks, from SQL query generation to ETL pipeline design.

This is a unique, forward-looking role at the intersection of data engineering and AI product development. You will work closely with our AI/ML engineering team to evaluate, critique, and improve the AI's ability to produce production-grade data engineering artifacts.

Key Responsibilities :

  • Validate AI-generated SQL queries, Python/PySpark code, and ETL pipeline designs for correctness, performance, and best practices
  • Design and execute evaluation frameworks to measure AI product accuracy across diverse data engineering scenarios
  • Identify edge cases, logical errors, and failure modes in AI-generated data engineering outputs
  • Compare AI-generated outputs against human expert baselines to define quality benchmarks
  • Collaborate with the AI/ML team to provide feedback that improves model outputs and reduces error rates
  • Define acceptance criteria and quality standards for AI-generated data engineering artifacts
  • Build test datasets and scenarios that stress-test the AI product's capabilities
  • Review and document AI product limitations, providing structured feedback for iterative improvement
  • Work with cross-functional teams including product, engineering, and AI research

Required Skills & Qualifications :

  • 6+ years of hands-on experience as a Data Engineer building production-grade data pipelines
  • Expert-level proficiency in SQL (complex queries, query optimization, query plan analysis)
  • Strong proficiency in Python and PySpark for data engineering workloads
  • Deep understanding of ETL/ELT patterns, data modeling (dimensional modeling, star/snowflake schemas), and data warehouse design
  • Experience with distributed data processing frameworks (Apache Spark, Databricks)
  • Solid understanding of data pipeline orchestration tools (Airflow, Prefect, or similar)
  • Experience with cloud data platforms (AWS, Azure, or GCP)
  • Strong ability to review and critique code for correctness, efficiency, and maintainability
  • Excellent analytical skills with an eye for spotting subtle logical errors in data transformations
  • Experience working in a collaborative team environment with engineering and product teams

Preferred Qualifications

  • Experience with streaming technologies (Kafka, Kinesis, or similar)
  • Familiarity with dbt (data build tool) and modern data stack practices
  • Exposure to AI/LLM technologies or experience evaluating AI-generated code
  • Experience with containerization and CI/CD (Docker, Kubernetes)
  • Background in data quality, data governance, or data observability

Want AI-powered job matching?

Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.

Get Started Free