Skip to main content
Avance Consulting logo

Data Engineer(Scala)

Avance Consulting
Full Timemid
Bengaluru, Karnataka, INPosted March 23, 2026

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

ScalaSQLAWSGCPAzureDockerKubernetesApacheMongoDBDynamoDBCassandraSnowflakeBigQueryGitKafkaSparkAirflowCI/CD

Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score

Job Description

Location: Bengaluru

Job Summary

We are seeking a highly skilled and experienced Senior Scala Data Engineer to join our dynamic data team. In this role, you will be instrumental in designing, developing, and maintaining our next-generation data pipelines and platforms using Scala, Apache Spark, and cloud-native technologies. You will work on challenging problems involving large-scale data ingestion, transformation, and processing, contributing directly to our analytical capabilities and product features.

Key Responsibilities:

  • Design & Development: Architect, build, and optimize robust, scalable, and efficient data pipelines using Scala and Apache Spark (Spark Core, Spark SQL, Spark Streaming).
  • Data Ingestion: Develop solutions for ingesting high-volume, high-velocity data from various sources (e.g., relational databases, NoSQL databases, APIs, message queues like Kafka, log files) into our data lake/warehouse.
  • Data Transformation: Implement complex data transformations, aggregations, and feature engineering logic to prepare data for analytics, machine learning models, and operational systems.
  • Performance Optimization: Identify and resolve performance bottlenecks in Spark jobs and data pipelines, ensuring optimal resource utilization and execution times.
  • Data Quality & Governance: Implement data validation, monitoring, and alerting mechanisms to ensure data accuracy, completeness, and consistency. Contribute to data governance best practices.
  • Cloud Infrastructure: Leverage and optimize cloud services (e.g., AWS EMR/Glue, Azure Databricks/Synapse, GCP DataProc/BigQuery) for data processing and storage.
  • Automation & Orchestration: Design and implement automated workflows for data pipelines using tools like Apache Airflow, AWS Step Functions, or similar.

Required Qualifications:

  • Experience: 5+ years of professional experience in data engineering, with a strong focus on building large-scale data solutions.
  • Scala Expertise: Proven advanced proficiency in Scala programming language.
  • Apache Spark: Deep hands-on experience with Apache Spark (Core, SQL, Streaming) for batch and real-time data processing.
  • Cloud Platforms: Extensive experience with at least one major cloud provider (AWS, Azure, or GCP) and their relevant data services (e.g., AWS S3, EMR, Glue, Kinesis; Azure Data Lake, Databricks, Event Hubs; GCP GCS, DataProc, Pub/Sub).
  • Data Warehousing: Strong understanding of data warehousing concepts, dimensional modeling (star/snowflake schemas), and ETL/ELT processes.
  • SQL: Expert-level SQL skills for data querying, manipulation, and optimization.
  • Distributed Systems: Experience working with distributed systems and understanding of their challenges (consistency, fault tolerance, concurrency).
  • Version Control: Proficiency with Git and collaborative development workflows.

Nice-to-Haves:

  • Streaming Technologies: Experience with real-time streaming platforms like Apache Kafka, Apache Flink, or Kinesis.
  • Containerization & Orchestration: Experience with Docker, Kubernetes, and container orchestration for Spark applications.
  • Data Orchestration Tools: Hands-on experience with Apache Airflow, Dagster, Prefect, or similar workflow management tools.
  • NoSQL Databases: Experience with NoSQL databases such as Cassandra, MongoDB, DynamoDB, or HBase.
  • Data Lakehouse/Modern DW: Experience with technologies like Delta Lake, Apache Iceberg, Snowflake, Redshift, or BigQuery.
  • MLOps: Familiarity with MLOps principles and supporting data pipelines for machine learning models.
  • CI/CD: Experience setting up and maintaining CI/CD pipelines for data engineering projects.
  • Performance Tuning: Advanced knowledge of Spark performance tuning techniques, including memory management, shuffle optimization, and data partitioning strategies.
  • Certifications: Relevant cloud (AWS Certified Data Analytics, Azure Data Engineer Associate, GCP Professional Data Engineer) or Spark certifications.

Want AI-powered job matching?

Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.

Get Started Free