Resume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score
Job Description
Data Engineer - AWS & Pyspark
Position: Data Engineer - AWS & Pyspark
Location: Nagpur/Pune
Type of Employment: Full-Time:
Purpose of the Position: You will be a critical member of the InfoCepts Cloud Data Architect Team. We are seeking an experienced Data Engineer with robust expertise in Databricks, PySpark, AWS, and Python to design and deliver scalable data pipelines, high-performance ETL frameworks, and reliable data solutions. The ideal candidate has a solid understanding of distributed data processing, cloud architecture, and modern data engineering best practices.
Key Result Areas and Activities:
Data Engineering & ETL Development
- Design, build, and optimize ETL/ELT pipelines using PySpark/Scala and Databricks on large-scale distributed data environments.
- Develop reusable data ingestion frameworks, transformation modules, and feature engineering pipelines.
- Ensure high-quality data processing with robust data validation, error handling, and observability.
Databricks Platform Engineering
- Work extensively with the Databricks Lakehouse platform—clusters, notebooks, Delta Lake, MLflow, jobs, and workflows.
- Implement best practices for Delta Lake, including schema evolution, time-travel, vacuuming, ZOrdering, partitioning, and optimization.
- Collaborate on job orchestration using Databricks Workflows, Jobs API, or Airflow
AWS Cloud Engineering
- Build and maintain data pipelines leveraging AWS services such as:
- S3, Glue, Lambda, IAM, Step Functions, Athena, Redshift or Snowflake, CloudWatch
- Implement secure data architectures following IAM, networking, encryption, and costoptimized design principles.
- Integrate Databricks with AWS data sources and event-driven systems.
- Working knowledge of OTF like Delta and Iceberg
Programming & Data Processing
- Write high-quality, production-grade Python code (modular, optimized, reusable).
- Develop PySpark jobs for batch and near real-time data transformations.
- Optimize Spark performance (partitions, broadcast variables, caching, cluster tuning).
Data Architecture, Governance & Quality
- Contribute to the design of data models, storage layers, and data lifecycle management.
- Implement best practices for data governance, metadata management, and lineage tracking.
- Ensure data reliability, performance, and accuracy across multiple environments.
Cross-Functional Collaboration
- Partner with analysts, data scientists, product teams, and business stakeholders to understand requirements.
- Document workflows, maintain Git-based version control, and participate in architecture reviews.
- Support production pipelines, troubleshoot issues, and continuously enhance system performance.
Roles & Responsibilities
Essential Skills:
- 5+ years of hands-on experience in Data Engineering.
- Strong expertise in PySpark and distributed data processing.
- Deep understanding of Databricks Lakehouse (Delta Lake, clusters, jobs, Workflows, MLflow).
- Proficiency in AWS data ecosystem (S3, Glue, Redshift, Lambda, Step Functions,EMR).
- Strong programming proficiency in Python (pandas, PySpark, APIs, modular code).
- Solid SQL skills (analytical functions, performance tuning).
- Experience with Git, CI/CD basics, and production deployments.
- Experience of working with AI based productivity improvement tools like Github Copilot
Desirable Skills :
- Familiarity with Unity Catalog, governance, and fine-grained access controls.
- Experience with Airflow, or other orchestration tools.
- Knowledge of Databricks SQL dashboards and visualization.
- Exposure to ML/AI workflows on Databricks (not mandatory).
Qualifications
- Bachelor’s degree in computer science engineering, or related field
- Demonstrated continued learning through one or more technical certifications or related methods
- 5+ years of relevant experience in Data Analytics
Qualities:
- Strong problem-solving mindset with attention to detail.
- Ability to work in agile, cross-functional, distributed teams.
- Excellent communication, documentation, and collaboration skills.
- Ownership-driven, proactive, and committed to delivering high-quality outcomes.
Similar Jobs
Senior Ab Initio Developer / Data Engineer
North Star Group
Data Engineer - Databricks
Experis/Manpower Group
Data Platform Engineer - Jersey City, NJ
AHU Technologies Inc
Senior Data Engineer - GCP Data & AI Solutions
Accenture
Azure Data Engineer Trainer
Vinay Tech House
Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free