Skip to main content

Supercharge your data discovery.

Illustration of data processed in a cloud platform and a battery charging
Icon of a hand holding a heavy dumbbell weight


  • Combine Kids First data with your own
  • Consistent clinical terms across conditions
  • Harmonized genomic variants available immediately
Icon of a fast running shoe


  • Build cohorts and explore variants rapidly
  • Cloud-based analysis in CAVATICA
  • Investigate outcomes in PedcBioPortal
Icon of a heart with a heartbeat


  • Uncover genetic predisposition for conditions in familial trios
  • Identify tumor oncogenes for drug development
  • Validate discoveries in model organisms to human health
Kids First

About the Data

Kids First studies on the portal are ready for analysis, having been harmonized and curated by the Kids First Data Resource Center team. These experts apply deep experience with pediatric data and are considerate of community feedback. Kids First data is functionally equivalent to other extensive genomic efforts such as GTeX and NCI Genomic Data Commons.

Variants of interest and genomic data are paired with clinical data for quick analysis startup.

Robust data includes SNVs, CNVs, and SVs annotated with Cancer Hotspots. Trio-structured germline data allows for de novo discoveries within families. These are produced by publicly available bioinformatic pipelines that are tested, documented, and accessible. Conditions are harmonized to be easily searched, regardless of the data language you speak. The Human Phenotype Ontology (HPO) for phenotypes and MONDO Ontology for diagnoses enable cohort discovery on the Kids First Portal.

Data Features



Germline variants for each participant in gVCF format for rapid joint genotyping


Somatic variant pipeline calling SNVs, CNVs, and SVs for cancer studies


Trio-based joint-called variants to identify de novo mutations in congenital disorder studies



Phenotypes mapped to the Human Phenotype Ontology (HPO)


Diagnoses mapped to the MONDO Disease Ontology (MONDO)


Search and build a cohort on the Kids First Data Resource Portal

Data Modalities


Whole Genome Sequencing


RNA Sequencing


Whole Exome Sequencing


Linked-Read Whole Genome Sequencing


Long Reads Sequencing

Access the Studies

The Kids First Data Resource Portal is a collection of studies from various investigators who perform disease-specific research. Originally part of separate research studies, the goal of collecting and sharing these studies to enable other investigators to combine and create new studies and research based on the data already collected.

Explore the many disease areas represented in datasets currently available through the portal. Additional conditions are anticipated to be added through future Kids First opportunities. Explore the many disease areas represented in studies currently available through the portal. Additional conditions are anticipated to be added through future Kids First opportunities.

Kids First Data

Your Questions Answered

Pediatric Doctor and Patient

What can you find on this website?

You can find out about the Gabriella Miller Kids First Data Resource Center, and sign up to access data, tools, and resources offered through Kids First.

Access pediatric cancer and congenital disorder data here.

How is Kids First Data Collected?

Researchers contribute tens of thousands of patient DNA samples collected from blood, tissue, and saliva to be sequenced and integrated with patient clinical data in the Kids First DRC.

In addition, patient families can partner with researchers by participating in studies seeking cures for childhood cancer and congenital disorders.

How to attribute Kids First Data in Publications

In addition to listing the PHS Accession Number(s) of the datasets used for a particular analysis and the databases from which they are accessible to the research community, X01 investigator teams (i.e., “Contributing Investigator(s)”) are asked to describe support for the project, including NIH grant numbers.

Secondary users, or “end users,” must acknowledge all datasets used in a publication or analysis by listing all relevant dbGaP PHS Accession Numbers and the URLs of the databases where the datasets were accessed. The Data Use Certification (DUC) agreed to by secondary users outlines how to use and acknowledge each approved dataset.

Is there a sample statement for the acknowledgment?

Yes! See below.

The results analyzed and here are based in whole or in part upon data generated by Gabriella Miller Kids First Pediatric Research Program (Kids First) projects and are accessible through from the Kids First Data Resource Portal ( and/or dbGaP ( Kids First was supported by the Common Fund of the Office of the Director of the National Institutes of Health ( The was awarded a U24 () to sequence [childhood cancer and/or structural birth defect cohort samples] submitted by investigators through the Kids First program (). Additional funds supported assembling the cohorts, collecting the phenotypic data and samples, and/or data analysis.

Contributing investigators include: *.

*If there are many collaborators/consortium members, you can use a ‘corporate authorship’ with a link to a website that lists everyone.

Kids First requires that researchers share genomic data generated by NIH funds. Learn more about the transformative Genomic Data Sharing Policy here.

Close Menu