Skip to main content
I

Advanced ETL Developer (AWS & PySpark) (India)

InfoCepts
Full Timemid
INPosted March 7, 2026

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonScalaSQLAWSSnowflakeGitGitHubSparkAirflowPandasAgileCI/CDAPI

Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score

Job Description

Data Engineer - AWS & Pyspark

Position: Data Engineer - AWS & Pyspark

Location: Nagpur/Pune

Type of Employment: Full-Time:

Purpose of the Position: You will be a critical member of the InfoCepts Cloud Data Architect Team. We are seeking an experienced Data Engineer with robust expertise in Databricks, PySpark, AWS, and Python to design and deliver scalable data pipelines, high-performance ETL frameworks, and reliable data solutions. The ideal candidate has a solid understanding of distributed data processing, cloud architecture, and modern data engineering best practices.

Key Result Areas and Activities:

Data Engineering & ETL Development

  • Design, build, and optimize ETL/ELT pipelines using PySpark/Scala and Databricks on large-scale distributed data environments.
  • Develop reusable data ingestion frameworks, transformation modules, and feature engineering pipelines.
  • Ensure high-quality data processing with robust data validation, error handling, and observability.

Databricks Platform Engineering

  • Work extensively with the Databricks Lakehouse platform—clusters, notebooks, Delta Lake, MLflow, jobs, and workflows.
  • Implement best practices for Delta Lake, including schema evolution, time-travel, vacuuming, ZOrdering, partitioning, and optimization.
  • Collaborate on job orchestration using Databricks Workflows, Jobs API, or Airflow

AWS Cloud Engineering

  • Build and maintain data pipelines leveraging AWS services such as:
  • S3, Glue, Lambda, IAM, Step Functions, Athena, Redshift or Snowflake, CloudWatch
  • Implement secure data architectures following IAM, networking, encryption, and costoptimized design principles.
  • Integrate Databricks with AWS data sources and event-driven systems.
  • Working knowledge of OTF like Delta and Iceberg

Programming & Data Processing

  • Write high-quality, production-grade Python code (modular, optimized, reusable).
  • Develop PySpark jobs for batch and near real-time data transformations.
  • Optimize Spark performance (partitions, broadcast variables, caching, cluster tuning).

Data Architecture, Governance & Quality

  • Contribute to the design of data models, storage layers, and data lifecycle management.
  • Implement best practices for data governance, metadata management, and lineage tracking.
  • Ensure data reliability, performance, and accuracy across multiple environments.

Cross-Functional Collaboration

  • Partner with analysts, data scientists, product teams, and business stakeholders to understand requirements.
  • Document workflows, maintain Git-based version control, and participate in architecture reviews.
  • Support production pipelines, troubleshoot issues, and continuously enhance system performance.

Roles & Responsibilities

Essential Skills:

  • 5+ years of hands-on experience in Data Engineering.
  • Strong expertise in PySpark and distributed data processing.
  • Deep understanding of Databricks Lakehouse (Delta Lake, clusters, jobs, Workflows, MLflow).
  • Proficiency in AWS data ecosystem (S3, Glue, Redshift, Lambda, Step Functions,EMR).
  • Strong programming proficiency in Python (pandas, PySpark, APIs, modular code).
  • Solid SQL skills (analytical functions, performance tuning).
  • Experience with Git, CI/CD basics, and production deployments.
  • Experience of working with AI based productivity improvement tools like Github Copilot

Desirable Skills :

  • Familiarity with Unity Catalog, governance, and fine-grained access controls.
  • Experience with Airflow, or other orchestration tools.
  • Knowledge of Databricks SQL dashboards and visualization.
  • Exposure to ML/AI workflows on Databricks (not mandatory).

Qualifications

  • Bachelor’s degree in computer science engineering, or related field
  • Demonstrated continued learning through one or more technical certifications or related methods
  • 5+ years of relevant experience in Data Analytics

Qualities:

  • Strong problem-solving mindset with attention to detail.
  • Ability to work in agile, cross-functional, distributed teams.
  • Excellent communication, documentation, and collaboration skills.
  • Ownership-driven, proactive, and committed to delivering high-quality outcomes.

Want AI-powered job matching?

Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.

Get Started Free