Data Engineer(Scala)

Avance Consulting

Full Timemid

Bengaluru, Karnataka, INPosted March 23, 2026

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

ScalaSQLAWSGCPAzureDockerKubernetesApacheMongoDBDynamoDBCassandraSnowflakeBigQueryGitKafkaSparkAirflowCI/CD

Job Description

Location: Bengaluru

Job Summary

We are seeking a highly skilled and experienced Senior Scala Data Engineer to join our dynamic data team. In this role, you will be instrumental in designing, developing, and maintaining our next-generation data pipelines and platforms using Scala, Apache Spark, and cloud-native technologies. You will work on challenging problems involving large-scale data ingestion, transformation, and processing, contributing directly to our analytical capabilities and product features.

Key Responsibilities:

Design & Development: Architect, build, and optimize robust, scalable, and efficient data pipelines using Scala and Apache Spark (Spark Core, Spark SQL, Spark Streaming).
Data Ingestion: Develop solutions for ingesting high-volume, high-velocity data from various sources (e.g., relational databases, NoSQL databases, APIs, message queues like Kafka, log files) into our data lake/warehouse.
Data Transformation: Implement complex data transformations, aggregations, and feature engineering logic to prepare data for analytics, machine learning models, and operational systems.
Performance Optimization: Identify and resolve performance bottlenecks in Spark jobs and data pipelines, ensuring optimal resource utilization and execution times.
Data Quality & Governance: Implement data validation, monitoring, and alerting mechanisms to ensure data accuracy, completeness, and consistency. Contribute to data governance best practices.
Cloud Infrastructure: Leverage and optimize cloud services (e.g., AWS EMR/Glue, Azure Databricks/Synapse, GCP DataProc/BigQuery) for data processing and storage.
Automation & Orchestration: Design and implement automated workflows for data pipelines using tools like Apache Airflow, AWS Step Functions, or similar.

Required Qualifications:

Experience: 5+ years of professional experience in data engineering, with a strong focus on building large-scale data solutions.
Scala Expertise: Proven advanced proficiency in Scala programming language.
Apache Spark: Deep hands-on experience with Apache Spark (Core, SQL, Streaming) for batch and real-time data processing.
Cloud Platforms: Extensive experience with at least one major cloud provider (AWS, Azure, or GCP) and their relevant data services (e.g., AWS S3, EMR, Glue, Kinesis; Azure Data Lake, Databricks, Event Hubs; GCP GCS, DataProc, Pub/Sub).
Data Warehousing: Strong understanding of data warehousing concepts, dimensional modeling (star/snowflake schemas), and ETL/ELT processes.
SQL: Expert-level SQL skills for data querying, manipulation, and optimization.
Distributed Systems: Experience working with distributed systems and understanding of their challenges (consistency, fault tolerance, concurrency).
Version Control: Proficiency with Git and collaborative development workflows.

Nice-to-Haves:

Streaming Technologies: Experience with real-time streaming platforms like Apache Kafka, Apache Flink, or Kinesis.
Containerization & Orchestration: Experience with Docker, Kubernetes, and container orchestration for Spark applications.
Data Orchestration Tools: Hands-on experience with Apache Airflow, Dagster, Prefect, or similar workflow management tools.
NoSQL Databases: Experience with NoSQL databases such as Cassandra, MongoDB, DynamoDB, or HBase.
Data Lakehouse/Modern DW: Experience with technologies like Delta Lake, Apache Iceberg, Snowflake, Redshift, or BigQuery.
MLOps: Familiarity with MLOps principles and supporting data pipelines for machine learning models.
CI/CD: Experience setting up and maintaining CI/CD pipelines for data engineering projects.
Performance Tuning: Advanced knowledge of Spark performance tuning techniques, including memory management, shuffle optimization, and data partitioning strategies.
Certifications: Relevant cloud (AWS Certified Data Analytics, Azure Data Engineer Associate, GCP Professional Data Engineer) or Spark certifications.