Getting Started-header-two

Getting Started-header-three

Search Data-header-two

Search Data-header-three

Analyze Data-header-two

Analyze Data-header-three

Studies and Access-header-two

Studies and Access-header-three

Studies and Access

For Developers-header-two

For Developers-header-three

Privacy Policy-header-two

Privacy Policy-header-three

Technical Support Icon

Technical Support


For assistance with the Kids First Data Portal, please send us your detailed questions.

Contact Us

Overview

This page outlines each available dataset and release notes on the searchable and downloadable data in the Kids First Data Resource Portal. Users requesting access to controlled data are required to have an eRA Commons account. Dataset authentication is managed by dbGaP or consortia Data Access Committees (DAC’s). To learn more about how to apply for data access, please review the “Applying for Data Access” guide.

We are continuously adding more data and working on quality improvements. As such, the data in the file repository may change as we work through known issues and improve our processing pipelines.

Available Datasets

Pediatric Brain Tumor Atlas: CBTTC

  • First Portal Release Date (beta): June 18th, 2018
  • Data Types Available::
    • Whole Genome Sequencing (WGS), RNA-Seq, Histology Images, Pathology Reports, Radiology Images, Radiology Reports, Operation Reports
  • Sequencing Center: Various
  • About the Study: CBTTC Website
  • Data Access Committee: CBTTC Data Access Committee
  • Applying for Access: CBTTC Data Access Form
  • DOI: 10.24370/SD_BHJXBDQK
  • Known Data Issues:
    • CBTTC clinical event data is collected in a way that associates a diagnosis to a biospecimen, most often a tumor. A participant can have multiple tumors over time that have different diagnoses. Currently, this data in the Kids First Data Resource Portal is being presented as a diagnosis being attached to the participant and the association between tumor and diagnosis is not being displayed. This issue is being worked on. In the meantime, a list of diagnoses and directly associated clinical events is available by emailing support@kidsfirstdrc.org.
  • Release Notes:
  • 4/5/19:
    • Removed lingering Austic Disorder Phenotypes as part of the the 3/28-29 clean up.
  • 3/28-29/19:
    • Refreshed Diagnosis, Outcome, and Demographic data based on latest records update.
    • Added back GRCh38 harmonized data for all samples. This adds back the missing reads in the q-arm of ChrX.
    • Removed Cancer Predisposition & “Other” diagnoses to focus on just the brain-tumor/cancer specific diagnoses.
  • 2/8/19:
    • Removed all GRCh38 harmonized data per latest Known Issue impacted by the missing reads in the q-arm of ChrX.
  • 2/5/19:
    • Fixed the Sequencing Experiments for the following Samples. Due to a library naming issue, they were associated with the incorrect experimental Strategy. The affected files & their resolution is in the attached sheet HERE
    • Removed the attached sheet of participants & all associated data for administrative review HERE
  • 11/5/18:
    • Published major release to account for underlying data model changes to add method_of_sample_procurement on biospecimen and to make HPO terms searchable by their standard name alongside their code.
  • 10/11/18:
    • Released data model change to move Consent Type from Participant to Biospecimen.
  • 10/05/18:
    • Mapped the harmonized files created by the DRC to the source genomic files’ sequencing experiment. This allows both source & harmonized files to be searchable when filtering on Experiment Strategy.
    • As part of the mapping of harmonized files to Sequencing Experiments for CBTTC, we have changed the sequencing experiment external IDs.
    • Imported 36 new harmonized files for 36 existing biospecimens.
    • Associated Diagnoses for 21 tumor Biospecimens; the biospecimen - diagnoses relationships were previously missing.
    • Corrected a mis-assigned aliquot ID for sample 7316-238-T-232096.WGS.
    • Corrected the analyte type for 7316-14-N-8710.WGS from RNA to DNA.
  • 9/18/18:
    • Removed source Expression (rsem) files from the portal. We will be providing harmonized versions in the near future.
    • Removed the biospecimen/diagnosis association for diagnoses classified as “Pre Existing Medical Conditions” and “Cancer Predispositions”. These are associated to participants and not to specific biospecimens.
    • Refreshed all diagnosis values from the CBTTC source databases as the clinical team continues to reclassify missing or “Other” diagnoses into defined buckets. This refresh also brought in more diagnosis values for the “Pre Existing Medical Conditions” and “Cancer Predispositions”.
    • Updated composition for 12 biospecimens to Derived Cell Line. They were previously set to Solid Tissue: 7316-1746-T-365613.WGS, 7316-1746-T-365613.RNA-Seq, 7316-1763-T-365902.RNA-Seq, 7316-3058-T-548405.WGS, 7316-85-T-61659.WGS, 7316-85-T-61659.RNA-Seq, 7316-1763-T-365902.WGS, 7316-3058-T-548405.RNA-Seq, 7316-913-T-345474.WGS, 7316-913-T-345474.RNA-Seq, 7316-85-T-61659.WGS, 7316-85-T-61659.RNA-Seq
    • Set composition for 4 biospecimens to Plasma. They were previously set to Blood: 7316-931-P-345767.WGS, 7316-467-P-323685.WGS, 7316-883-P-344950.WGS, 7316-378-P-242813.WGS
    • Removed the following biospecimens and associated data files because potential tumor-normal mismatch issues have been identified are undergoing further QC review: 7316-471-T-323762.WGS, 7316-406-T-311440.WGS, 7316-2658-T-479078.WGS, 7316-878-T-344873.WGS, 7316-471-N-323754.WGS, 7316-406-N-311439.WGS, 7316-2658-N-479074.WGS, 7316-878-N-344866.WGS
    • Set source text anatomical site to “Central Nervous System” for external ID 7316-333-N-242258.WGS
  • 9/10/18: Initial versioned release of the Pediatric Brain Tumor Atlas. CBTTC data are made publicly available pre-publications under the above DOI with processed data available on PedCBioPortal.

Orofacial-Cleft:European-Ancestry

  • First Portal Release Date (beta): June 18th, 2018
  • Data Types Available:: Aligned Reads, gVCFs
  • Sequencing Center: Washington University with harmonized data generated by the DRC
  • About the Study: NIH X01 Project Abstract - Mary Marazita, PI
  • Data Access Committee: Joint NIAMS-NIDCR Data Access Committee
  • Applying for Access: phs001168 dbGaP Study Page
  • Known Data Issues:
    • None at this time.
  • Release Notes:
  • 4/10/19:
    • Added additional samples to the study in conjunction with dbGaP release.
    • Added back harmonized data for the files removed on 2/8/19.
  • 2/8/19: Removed the following GRCh38 harmonized files per latest Known Issue impacted by the missing reads in the q-arm of ChrX LIST HERE
  • 2/5/19:
    • Major release for back end structural changes in preparation to release family-based joint genotyping files.
    • Removed small number of participants for further genomic QC review. Full list LIST HERE
  • 11/5/18:
    • Originally IA3006 was proband with parents IA3004 and IA3005. Now, IA3004 is proband with parents IA3005 and IA3006
    • Published major release to account for underlying data model changes to add method_of_sample_procurement on biospecimen and to make HPO terms searchable by their standard name alongside their code.
  • 10/11/18:
    • Released data model change to move Consent Type from Participant to Biospecimen.
  • 10/05/18:
    • Mapped the harmonized files created by the DRC to the source genomic files’ sequencing experiment. This allows both source & harmonized files to be searchable when filtering on Experiment Strategy.
  • 9/18/18:
    • Removed the following biospecimens due to QC issues found during genomic data review.
      • External Participant & Sample ID -> Exclusion Reason
      • MD0031 -> Uncertain Identity of samples
      • MD0032 -> Uncertain Identity of samples
      • MD0033 -> Uncertain Identity of samples
      • MD0280 -> Uncertain Identity of samples
      • MD0281 -> Uncertain Identity of samples
      • MD0282 -> Uncertain Identity of samples
      • PA2063 -> Uncertain identity of sample
      • PA2027 -> Uncertain identity of sample
      • PA2200 -> High missing rate
      • PA2254 -> Duplicate of another sample
      • IA2650 -> Uncertain identity of samples
      • IA2651 -> Uncertain identity of samples
      • IA2652 -> Uncertain identity of samples
      • IA2836 -> Uncertain identity of samples
      • IA2837 -> Uncertain identity of samples
      • IA2838 -> Uncertain identity of samples
      • IA4062 -> High missing rate
      • MD3181 -> High Het/Hom ratio
      • IA4019 -> High missing rate
      • IA4022 -> High missing rate
      • IA3038 -> Definitely unrelated to offspring
      • IA4054 -> High missing rate
  • 9/10/18: Initial versioned release of this study as part of the Kids First Data Resource Center. Latest dbGaP release notes found here.

Ewing Sarcoma: Genetic Risk

  • First Portal Release Date (beta): June 18th, 2018
  • Data Types Available:: Aligned Reads, gVCFs
  • Sequencing Center: Washington University with harmonized data generated by the DRC
  • About the Study: NIH X01 Project Abstract - Joshua Schiffman, PI
  • Data Access Committee: NCI DAC
  • Applying for Access: phs001228 dbGaP Study Page
  • Known Data Issues:
    • None at this time.
  • Release Notes:
  • 4/23/19:
    • Added back GRCh38 harmonized data for all samples. This adds back the missing reads in the q-arm of ChrX.
    • Updated the list of samples hidden that are undergoing further genomic QC Review. LIST HERE
  • 2/22/19: Fixed issue where some Normal blood/saliva samples were incorrectly labeled as Tumor. Correctly labeled their composition as Normal.
  • 2/8/19: Removed the following GRCh38 harmonized files per latest Known Issue impacted by the missing reads in the q-arm of ChrX LIST HERE
  • 2/5/19:
    • Major release for back end structural changes in preparation to release family-based joint genotyping files.
    • Removed 35 participants & data for further genomic QC review. Full details can be found on this sheet HERE
  • 11/5/18:
    • Published major release to account for underlying data model changes to add method_of_sample_procurement on biospecimen and to make HPO terms searchable by their standard name alongside their code.
  • 10/11/18:
    • Released data model change to move Consent Type from Participant to Biospecimen.
  • 10/05/18:
    • Mapped the harmonized files created by the DRC to the source genomic files’ sequencing experiment. This allows both source & harmonized files to be searchable when filtering on Experiment Strategy.
  • 9/18/18: Updated Participant consent types to align with dbGaP.
  • 9/10/18: Initial versioned release of this study as part of the Kids First Data Resource Center. Latest dbGaP release notes found here.

Syndromic Cranial Dysinnervation

  • First Portal Release Date (beta): August 23rd, 2018
  • Data Types Available:: Aligned Reads, gVCFs
  • Sequencing Center: Baylor College of Medicine with harmonized data generated by the DRC
  • About the Study: NIH X01 Project Abstract - Elizabeth Engle, PI
  • Data Access Committee: Kids First DAC
  • Applying for Access: phs001247 dbGaP Study Page
  • Known Data Issues:
    • None at this time.
  • Release Notes:
  • 4/23/19:
    • Added back GRCh38 harmonized data for all samples. This adds back the missing reads in the q-arm of ChrX.
    • Updated the list of samples hidden that are undergoing further genomic QC Review. LIST HERE
  • 2/8/19:
    • Removed all GRCh38 harmonized data per latest Known Issue impacted by the missing reads in the q-arm of ChrX.
  • 2/5/19:
    • Major release for back end structural changes in preparation to release family-based joint genotyping files.
    • Removed 50 participants & data for further genomic QC review. Full details can be found on this sheet HERE.
  • 11/5/18:
    • Published major release to account for underlying data model changes to add method_of_sample_procurement on biospecimen and to make HPO terms searchable by their standard name alongside their code.
  • 10/11/18:
    • Released data model change to move Consent Type from Participant to Biospecimen.
  • 10/05/18:
    • Mapped the harmonized files created by the DRC to the source genomic files’ sequencing experiment. This allows both source & harmonized files to be searchable when filtering on Experiment Strategy.
  • 9/18/18:
    • Fixed participant with missing family ID. PT_BX9B2A7T now has the correct family ID.
    • Updated participant consent types to align with dbGaP.
  • 9/10/18: Initial versioned release of this study as part of the Kids First Data Resource Center. Latest dbGaP release notes found here.

Congenital Heart Defects

  • First Portal Release Date (beta): June 18th, 2018
  • Data Types Available:: Aligned Reads, gVCFs
  • Sequencing Center: Baylor College of Medicine with harmonized data generated by the DRC
  • About the Study: NIH X01 Project Abstract - Christine Seidman, PI
  • Data Access Committee: Kids Fist DAC
  • Applying for Access: phs001138 dbGaP Study Page
  • Known Data Issues:
    • None at this time.
  • Release Notes:
  • 4/5/19: Updated consents & ACLS in accordinace with latest dbGaP release.
  • 3/28/19:
    • Added in FY16 samples in coordination with latest dbGaP release.
    • Added in GRCh38 harmonized data for both FY15 & FY16 samples. This adds back the missing reads in the q-arm of ChrX.
  • 2/8/19:
    • Removed all GRCh38 harmonized data per latest Known Issue impacted by the missing reads in the q-arm of ChrX.
  • 2/5/19:
    • Major release for back end structural changes in preparation to release family-based joint genotyping files.
  • 11/5/18:
    • Published major release to account for underlying data model changes to add method_of_sample_procurement on biospecimen and to make HPO terms searchable by their standard name alongside their code.
  • 10/11/18:
    • Released data model change to move Consent Type from Participant to Biospecimen.
  • 10/05/18:
    • Mapped the harmonized files created by the DRC to the source genomic files’ sequencing experiment. This allows both source & harmonized files to be searchable when filtering on Experiment Strategy.
  • 9/18/18:
    • The study was successfully decoupled from its parent study. As part of this, the data is now downloadable from the portal for those who have been granted dbGaP access.
    • Updated participant consent types to align with dbGaP.
  • 9/10/18: Initial versioned release of this study as part of the Kids First Data Resource Center.

Adolescent Idiopathic Scoliosis

  • First Portal Release Date: October 12th, 2018
  • Data Types Available:: Aligned Reads, gVCFs
  • Sequencing Center: Hudson Alpha with harmonized data generated by the DRC
  • About the Study: NIH X01 Project Abstract - Jonathan Rios, PI
  • Data Access Committee: Kids First DAC
  • Applying for Access: phs001410 dbGaP Study Page
  • Known Data Issues:
    • None at this time.
  • Release Notes:
  • 2/5/19:
    • Major release for back end structural changes in preparation to release family-based joint genotyping files.
    • Removed 37 participants & data for further genomic QC review. Full details can be found on this sheet HERE
  • 11/5/18:
    • Published major release to account for underlying data model changes to add method_of_sample_procurement on biospecimen and to make HPO terms searchable by their standard name alongside their code.
  • 10/11/18:
    • Initial portal release

Congenital Diaphragmatic Hernia

  • First Portal Release Date (beta): June 18th, 2018
  • Data Types Available:: Aligned Reads, gVCFs
  • Sequencing Center: Baylor College of Medicine with harmonized data generated by the DRC
  • About the Study: NIH X01 Project Abstract - Wendy Chung, PI
  • Data Access Committee: NICHD-DAC
  • Applying for Access: phs001110 dbGaP Study Page
  • Known Data Issues:
    • This study is currently missing some phenotypic & clinical data. We are currently working on curating it.
  • Release Notes:
  • 4/23/19:
    • Added back GRCh38 harmonized data for all samples. This adds back the missing reads in the q-arm of ChrX.
    • Updated the list of samples hidden that are undergoing further genomic QC Review. LIST HERE
  • 2/8/19:
    • Removed all GRCh38 harmonized data per latest Known Issue impacted by the missing reads in the q-arm of ChrX.
  • 2/5/19:
    • Major release for back end structural changes in preparation to release family-based joint genotyping files.
    • Removed 7 participants & data for further genomic QC review. Full details can be found on this sheet HERE
  • 11/5/18:
    • Published major release to account for underlying data model changes to add method_of_sample_procurement on biospecimen and to make HPO terms searchable by their standard name alongside their code.
  • 10/11/18:
    • Released data model change to move Consent Type from Participant to Biospecimen.
  • 10/05/18:
    • Mapped the harmonized files created by the DRC to the source genomic files’ sequencing experiment. This allows both source & harmonized files to be searchable when filtering on Experiment Strategy.
    • There are 12 participant/biospecimen IDs that have changed since the last release of this study. However, the old IDs are still referenced on dbGaP. Thus, in this release the External Sample Id and External Aliquot Id fields on biospecimen will refer to the old IDs. The External Id field on other clinical entities such as participant, family_relationship, diagnosis, phenotype, and outcome refer to/contain the new IDs. Once dbGaP is updated with the new IDs, the biospecimen External Sample Id and External Aliquot Id fields will be updated.
      • Old: 216 / New: CDH216
      • Old: 217 / New: CDH217
      • Old: 319 / New: CDH319
      • Old: 320 / New: CDH320
      • Old: 549 / New: CDH549
      • Old: 576 / New: CDH576
      • Old: 01-0218 / New: CDH218
      • Old: 01-0318 / New: CDH318
      • Old: 01-0577 / New: CDH577
      • Old: 05-0015 / New: CDH05-0015
      • Old: 5-15F / New: CDH5-15F
      • Old: 5-15M / New: CDH5-15M
  • 9/18/18:
    • Assigned all pro bands “Congenital diaphragmatic hernia” as a diagnosis. Previously, no diagnoses were assigned.
    • Added proband label to the children in the trios.
    • Updated participant consent types abbreviations.
  • 9/10/18: Initial versioned release of this study as part of the Kids First Data Resource Center. Latest dbGaP release notes found here.

Disorders of Sex Development

  • First Portal Release Date: March 26th, 2019
  • Data Types Available:: Aligned Reads, gVCFs
  • Sequencing Center: Baylor College of Medicine with harmonized data generated by the DRC
  • About the Study: NIH X01 Project Abstract - Eric Vilain, PI
  • Data Access Committee: Kids First DAC
  • Applying for Access: phs001178 dbGaP Study Page
  • Known Data Issues:
    • This study does not have probands - it has affected children. Thus, family relationship types display as “Other” instead of duo, trio, etc. We are working on resolving this.
    • The standard QC performed by the DRC using Peddy reports a high number of participants whose genetically determined sex does not match the gender reported. However, due to the nature of this study, we have have chosen to release the data on the portal while further review occurs.
  • Release Notes:
  • 4/23/19:
    • Added GRCh38 harmonized data for all samples. This adds back the missing reads in the q-arm of ChrX.
    • Added back 49 participants that were removed for further genomic QC review.
  • 3/26/19:
    • First Release on Kids First Data Resource Portal in conjunction with public dbGaP release.

TARGET AML

  • First Portal Release Date: March 29th, 2019
  • Data Types Available:: Aligned Reads made available via the GDC (Genomic data is not stored at the Kids First DRC)
  • Applying for Access: phs000465 dbGaP Study Page
  • General Notes:
    • Genomic data is not stored at the Kids First DRC; just the file metadata and clinical data is indexed on the portal.
    • Access is governed through the NCI CRDC Framework Services, and users will be required to authenticate a second time with their eRA Commons.
  • Release Notes:
  • 3/29/19:
    • First Release on Kids First Data Resource Portal.

TARGET NBL

  • First Portal Release Date: March 29th, 2019
  • Data Types Available:: Aligned Reads made available via the GDC (Genomic data is not stored at the Kids First DRC)
  • Applying for Access: phs000467 dbGaP Study Page
  • General Notes:
    • Genomic data is not stored at the Kids First DRC; just the file metadata and clinical data is indexed on the portal.
    • Access is governed through the NCI CRDC Framework Services, and users will be required to authenticate a second time with their eRA Commons.
  • Release Notes:
  • 3/29/19:
    • First Release on Kids First Data Resource Portal.

Known Data Issues

Last Updated: 2/6/19

Below is a list of known other data issues that we are actively working to resolve.

Other Issues:

  • HPO Values: Some HPO values may be missing or incorrectly assigned. We are actively reviewing and QCing these across all studies.
  • Future Use Facets: The following facets are available in the portal but are in development for future use and may not have valid values at this time:
    • Alias Group
    • Availability
    • ICD ID Diagnosis
    • HPO Observed/Not Observed
    • Release Status
    • Shipment Date
    • Shipment Origin
    • SNOMED Observed/Not Observed
    • Spatial Descriptor (Biospecimen)
    • Spatial Descriptor (Diagnosis)

Notice an issue?

We are continuously looking for feedback on how to make the data on the Kids First DRP more searchable and usable to the community. If you notice an issue, have a question or want to provide a suggestion, please use the feedback widget within the portal or email us at support@kidsfirstdrc.org.

Access Data

Data Access Levels

The Kids First DRC supports three different data access tiers.

NIH Trusted Partner Environment

A “trusted partner” is defined as a public or private, national or international organization that is able to meet core NIH standards for establishing data quality and data management service protocols for NIH, based on the programmatic need of an NIH funding Institute or Center (IC)

Bionimbus is a trusted partner that is cloud-agnostic, operated by the University of Chicago, and is powered by the Gen3 software stack

By partnering with Bionimbus, Kids First datasets can be distributed in line with the NIH’s current genomic data sharing policies:

  • Data will be maintained through controlled access:
    • A. Permission to access data will be requested through NIH Data Access Committees, per NIH-prescribed processes for the institutional certification of data sharing requests
    • B. Standard telemetry will be used to communicate with NIH systems for authenticating Approved Users through the dbGaP data request process

Linking your eRA Commons Account to Gen3 & the Portal

To analyze data on Cavatica or to download genomic files locally, you must link your Kids First DRC Account to Gen3 via your eRA Commons login.

  1. Sign into your Kids First DRC Portal account.
  2. Navigate to Settings from the upper right-hand corner drop down, under your name.
  3. Under settings, scroll down to the Integrations section. Locate Gen3 Data Commons. Click Connect.
  4. You will be directed to https://auth.nih.gov/ to sign in. Providing your eRA Commons credentials will redirect you back to the Portal and complete your Gen3 integration.

Applying for Data Access

Access to controlled access data requires authorization from the appropriate Data Access Committee (DAC). While most dataset access within the Kids First DRC is granted through dbGaP, there are some datasets whose access is reviewed & granted through consortia DAC’s. Please reference the datasets above for their specific access management information. For any questions on how to apply for dbGaP access, please visit their page here.