Skip to main content
search

The Open Pediatric Cancer Project

Background: In 2019, the Open Pediatric Brain Tumor Atlas (OpenPBTA) was created as a global, collaborative open-science initiative to genomically characterize 1,074 pediatric brain tumors and 22 patient-derived cell lines. Here, we present an extension of the OpenPBTA called the Open Pediatric Cancer (OpenPedCan) Project, a harmonized open-source multiomic dataset from 6,112 pediatric cancer patients with 7,096 tumor events across more than 100 histologies. Combined with RNA sequencing (RNA-seq) from the Genotype-Tissue Expression and The Cancer Genome Atlas projects, OpenPedCan contains nearly 48,000 total biospecimens (24,002 tumor and 23,893 normal specimens).

Findings: We utilized Gabriella Miller Kids First workflows to harmonize whole-genome sequencing (WGS), whole exome sequencing (WXS), RNA-seq, and Targeted Sequencing datasets to include somatic SNVs, indels, copy number variants, structural variants, RNA expression, fusions, and splice variants. We integrated summarized Clinical Proteomic Tumor Analysis Consortium whole-cell proteomics and phospho-proteomics data and miRNA sequencing data, as well as developed a methylation array harmonization workflow to include m-values, beta-values, and copy number calls. OpenPedCan contains reproducible, dockerized workflows in GitHub, CAVATICA, and Amazon Web Services (AWS) to deliver harmonized and processed data from over 60 scalable modules, which can be leveraged both locally and on AWS. The processed data are released in a versioned manner and accessible through CAVATICA or AWS S3 download (from GitHub) and queryable through PedcBioPortal and the National Cancer Institute’s pediatric Molecular Targets Platform. Notably, we have expanded Pediatric Brain Tumor Atlas molecular subtyping to include methylation information to align with the World Health Organization 2021 Central Nervous System Tumor classifications, allowing us to create research-grade integrated diagnoses for these tumors.

Conclusions: OpenPedCan data and its reproducible analysis module framework are openly available and can be utilized and/or adapted by researchers to accelerate discovery, validation, and clinical translation.

Close Menu