Senior Data Engineer (Python - PySpark and AWS)

Toronto, Ontario, CAPosted 7 weeks ago

Role Overview

Collabera is hiring a Senior Data Engineer (Python - PySpark and AWS). This is a contract role in Toronto. Part of Collabera's Data Engineering hiring. Full responsibilities, required qualifications, and the apply link are listed in the description below.

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonAWSApacheSparkAirflowEMRORBenefits

Job Description

Title: Senior Data Engineer

Client: Investments Industry

# of Openings: 1

Type: 6-Month Contract (High likelihood of extension)

Location: Toronto, ON

Work Model: 4 days/week onsite, Friday WFH

PR:$80-100/hr

Role Overview

We are seeking a Senior Data Engineer (8-10+ years experience) to support a large-scale data platform transformation within the Total Fund Management (TFM) team.
This role will focus on migrating and modernizing existing Databricks-based pipelines to AWS (EMR Spark), with an initial lift-and-shift phase, followed by optimization and redesign into scalable, consumable data products.
This is a highly autonomous, hands-on role requiring strong PySpark expertise, deep experience with distributed data systems, and the ability to navigate complex, multi-source datasets (including market and reference data vendors).

Day-to-Day Responsibilities

Migrate existing Databricks-based Spark pipelines to AWS EMR (Spark)
Perform lift-and-shift of ~50+ datasets, some with high complexity and multiple data sources
Refactor and optimize data pipelines for performance, scalability, and reliability
Structure and store data using Parquet and Iceberg formats
Improve and clean up legacy data pipelines built over several years
Design data with a consumption-first mindset (e.g., partitioning strategies, access patterns, data usability)
Collaborate with stakeholders to understand data requirements and translate into scalable solutions
Ensure production readiness including monitoring, orchestration, and deployment
Work independently to drive delivery from design through implementation

Key Responsibilities

Develop and optimize large-scale PySpark data pipelines
Rebuild and enhance Spark workloads in AWS (EMR)
Leverage tools such as Airflow, AWS Glue, and Lake Formation
Handle parallel/distributed data processing workloads
Improve system performance and data quality across pipelines
Engage with business and technical stakeholders to align on data needs
Own delivery with minimal oversight in a fast-paced environment

Must-Haves

8-10+ years of Data Engineering experience (senior-level profiles only)
Strong hands-on expertise in Python and PySpark
Deep experience with Apache Spark in distributed environments
Proven experience working with large-scale, complex data pipelines
Experience with Databricks (existing environment)
Strong knowledge of Parquet and Iceberg data formats
Experience with AWS data ecosystem (EMR preferred)
Familiarity with Airflow, Glue, and Lake Formation
Strong understanding of parallel/distributed data processing
Ability to work independently with strong problem-solving skills
Experience in ambiguous environments with evolving requirements

Nice-to-Haves

Prior experience in capital markets or investment management
Experience working with market data / reference data vendors
Experience designing data products and consumption layers
Exposure to large-scale data platform migrations or transformations

We may use AI-enabled and/or automated tools to support parts of our recruitment process, including application screening, interview scheduling, and candidate communications. These tools are used to enhance consistency and efficiency. All hiring decisions involve human review and are not based solely on automated processing.

The Company offers a total rewards package in accordance with all applicable federal, provincial, and local laws and requirements. Benefit eligibility and offerings vary based on role, employment status, and work location. For contractor positions, benefits are limited to those entitlements and protections required by applicable law, which may include (as applicable) vacation pay, public holidays, leaves of absence, and other legally mandated benefits or payments.

About Collabera

Collabera

collabera.com

Data EngineeringOn-site

13 other open roles at Collabera on TryApplyNow.

View all 14 roles at Collabera Visit website

Frequently Asked Questions

How do I apply for the Senior Data Engineer (Python - PySpark and AWS) position at Collabera?

Use the Apply button above to submit your application directly to Collabera. Most applications take less than 5 minutes if your resume and contact details are ready, and you'll be routed to the employer's official application system to finish.

Where is the Senior Data Engineer (Python - PySpark and AWS) position at Collabera located?

This position is based in Toronto. Collabera has not indicated remote or hybrid options for this role, so candidates should plan for on-site work.

What does a Senior Data Engineer (Python - PySpark and AWS) at Collabera earn?

Collabera has not disclosed a salary range in this posting. Many employers share specifics later in the interview process; you can also ask during a recruiter screen if compensation transparency is important to you.

When was the Senior Data Engineer (Python - PySpark and AWS) role at Collabera posted?

This role was posted on May 11, 2026 (51 days ago). It's still listed as actively hiring; we re-confirm openings against the source system multiple times per day and remove closed roles.

How much experience does the Senior Data Engineer (Python - PySpark and AWS) role at Collabera require?

This is a senior-level position. Most senior roles call for 5+ years of directly relevant experience. Collabera lists their specific requirements in the description below, so review the must-have qualifications closely before applying.