Berkeley Lab’s (LBNL) Joint Genome Institute (JGI) has an opening for a Senior Scientific Data Platform Engineer to play a critical role in transforming raw scientific outputs into high-value, AI-ready data assets that directly support JGI's mission.

In this exciting role, you will analyze scientific use-cases, data management challenges, and design robust, automated, and cost-effective solutions to address them. Working at the intersection of scientific data and the data lakehouse platform, this position will contribute to the development and maintenance of JGI’s Data Lakehouse effort, leading ongoing integration efforts to ensure scientific data is well-structured, accessible, and optimized for use by domain scientists and downstream AI applications.

The JGI’s mission is to provide the global research community with access to the most advanced integrative genome science capabilities in support of the DOEs research mission to solve the world’s evolving energy and environmental challenges. The JGI supports projects in genome sequencing, synthesis, transcriptomics, metabolomics, and natural products in plants, fungi, algae, and microorganisms. This position is headquartered on the Lab’s main site at the Integrative Genomics Building (IGB) (Virtual Tour).

This position has an anticipated start date of April 1, 2026.

We’re here for the same mission, to bring science solutions to the world. Join our team and YOU will play a supporting role in our goal to address global challenges! Have a high level of impact and work for an organization associated with 17 Nobel Prizes!

Why join Berkeley Lab?

We invest in our employees by offering a total rewards package you can count on:

Exceptional health and retirement benefits, including pension or 401K-style plans
A culture where you’ll belong - we are invested in our teams!
In addition to accruing vacation and sick time, we also have a Winter Holiday Shutdown every year.
Parental bonding leave (for both mothers and fathers)
Pet insurance

What You Will Do:

Analyze and evaluate complex scientific use-cases and design automated system solutions.
Provide technical expertise in identifying, evaluating, and developing cost-effective systems and procedures that meet user requirements.
Lead the design and implementation of data integration processes for the JGI's Data Lakehouse, ensuring large scientific datasets are structured for efficient querying and analysis.
Design, build, and maintain fault-tolerant, scalable, and efficient Extract, Transform, Load (ETL) data pipelines to ingest, transform, and load genomic data and associated metadata into the Data Lakehouse.
Configure system settings and options.
Plan and perform unit, integration, and acceptance testing.
Create system specifications aligned with business requirements.
Provide consultation and guidance to domain scientists and other users on the use of automated systems.
Collaborate closely with cross-functional teams to resolve business and system-related issues.

What Is Required:

A Bachelor’s Degree (or equivalent knowledge/training) in Computer Science, Data Engineering, or a related technical field and a minimum of 8 years of demonstrated experience structuring large-scale datasets for efficient use in Data Lakehouse environments, leveraging technologies such as Parquet, Iceberg, Dremio, Spark, or similar lakehouse and data warehousing platforms or an equivalent combination of education and experience.
Demonstrated proficiency with modern Extract, Transform, Load (ETL) and Extract, Load, Transform (ELT) tools and frameworks.
Strong scripting skills in data engineering languages, including Python (with Pandas and PySpark) and advanced SQL for data manipulation and performance optimization.
Strong analytical skills including the ability to identify problems, troubleshoot, and demonstrate good judgement in selecting methods and techniques for obtaining solutions.
Excellent oral and written communication skills, including experience organizing and presenting technical information to varying audiences.
Demonstrated interpersonal skills including experience collaborating with an interdisciplinary research team.

Desired Qualifications:

A Master’s Degree (or equivalent knowledge/training) in Computer Science, Data Engineering, or a related technical field.
Experience with Data Lakehouse technologies like Dremio or Spark.
Domain knowledge of genomics data.

Additional Information:

Application Date: Priority consideration will be given to candidates who apply with a resume and cover letter by March 9, 2026. Applications will be accepted until the job posting is removed.
Appointment Type: This is a full time, exempt from overtime pay (monthly paid), 2 year (benefits eligible), Term appointment with the possibility of extension or conversion to Career appointment based upon satisfactory job performance, continuing availability of funds, and ongoing operational needs.
Sa

Senior Scientific Data Platform Engineer (Joint Genome Institute)

Resume Keywords to Include

Job Description

Want AI-powered job matching?