Senior Associate Data Engineer Emerging Businesses Advisory

Pwc Hitech City

Full Timesenior

INPosted 7 weeks ago

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonSQLAzureApacheGitSparkCI/CDDevOps

Job Description

Role Overview:

At PwC, we are looking for a highly skilled and motivated Data Engineer to join our growing data engineering team. As a Data Engineer, your primary responsibility will be designing, developing, and maintaining scalable data pipelines and data lake solutions using Azure Data Services, Apache Spark, and other Big Data technologies. You will collaborate closely with data scientists, analysts, and business stakeholders to ensure the availability, quality, and reliability of data assets for analytics and AI workloads.

Key Responsibilities:

Design, build, and maintain scalable and reliable data pipelines using Azure Data Factory, Azure Synapse, and Databricks.
Develop and optimize large-scale data processing jobs using PySpark and Spark SQL on Azure Databricks or Synapse Spark pools.
Manage and work with large datasets stored in data lakes such as ADLS Gen2 and integrate with enterprise data warehouses like SQL Server and Synapse.
Implement robust data transformation, cleansing, and aggregation logic to support analytics and reporting use cases.
Collaborate with BI developers, analysts, and data scientists to provision clean, reliable, and timely datasets.
Optimize data flows for performance and cost efficiency on Azure cloud platforms.
Implement data governance, lineage, and security practices across the data architecture.
Troubleshoot and resolve data pipeline failures, ensuring high availability and fault-tolerant design.
Participate in code reviews, adhere to version control practices, and maintain high coding standards.

Qualifications Required:

Mandatory skill sets:
Azure Data Factory (ADF)
Azure Synapse Analytics
Azure Databricks
Apache Spark
PySpark
SQL Server, T-SQL
Synapse SQL
Azure Data Lake Storage Gen2
Big Data ecosystem knowledge (Parquet, Delta Lake, etc.)
Git, DevOps pipelines for data engineering
Performance tuning of Spark jobs and SQL queries
Preferred skill sets:
Python for data engineering workflows
Azure Monitor Log Analytics for pipeline observability
Power BI data modeling, DAX, Delta Lake
Experience with CI/CD for data pipelines, YAML pipelines, ADF integration
Knowledge of data quality tools and frameworks

Additional Company Details:

At PwC, we believe in providing equal employment opportunities without any discrimination. We strive to create an environment where each individual can bring their true selves and contribute to personal and firm growth, with zero tolerance for discrimination and harassment based on various considerations. Role Overview:

Key Responsibilities:

Design, build, and maintain scalable and reliable data pipelines using Azure Data Factory, Azure Synapse, and Databricks.
Develop and optimize large-scale data processing jobs using PySpark and Spark SQL on Azure Databricks or Synapse Spark pools.
Manage and work with large datasets stored in data lakes such as ADLS Gen2 and integrate with enterprise data warehouses like SQL Server and Synapse.
Implement robust data transformation, cleansing, and aggregation logic to support analytics and reporting use cases.
Collaborate with BI developers, analysts, and data scientists to provision clean, reliable, and timely datasets.
Optimize data flows for performance and cost efficiency on Azure cloud platforms.
Implement data governance, lineage, and security practices across the data architecture.
Troubleshoot and resolve data pipeline failures, ensuring high availability and fault-tolerant design.
Participate in code reviews, adhere to version control practices, and maintain high coding standards.

Qualifications Required: