Getting Started-header-two

Getting Started-header-three

Search Data-header-two

Search Data-header-three

Search Data

Analyze Data-header-two

Analyze Data-header-three

Studies and Access-header-two

Studies and Access-header-three

For Developers-header-two

For Developers-header-three

Frequently Asked Questions-header-two

Frequently Asked Questions-header-three

Privacy Policy-header-two

Privacy Policy-header-three

Technical Support Icon

Technical Support


For assistance with the Kids First Data Portal, please send us your detailed questions.

Contact Us

Search Data

File Repository & Filter/Facets

The File Repository is the primary method of accessing data in the Kids First Data Resource Portal. It provides an overview of data available in Kids First and offers users a variety of filters for identifying and browsing participants and files of interest. Users can access the File Repository section from the Kids First Data Portal top menu bars.

On the left, a panel of data facets allows users to filter participants and files using a variety of criteria. If facet filters are applied, the table on the right will display information about matching participants and files. If no filters are applied, the table on the right will display information about all available data.

When the user applies filters, a banner appears above the table displaying the active filters that have been applied and provides access to share and save the query for later reference.

The File Repository section provides access to additional data filters beyond the defaults. Filters corresponding to additional properties listed in the Kids First Data Dictionary can be added using the ALL FILTERS button available at the top of the filter panel.

The button opens a search window that allows the user to find an additional filter by name or value. Not all filters have values available for filtering; checking the “Show only fields with values” checkbox will limit the search results to only those that do.

File Browser Column Definitions

Below are the definitions and descriptions of the column headers in the Kids First Data Resource Portal File Browser. Some of these overlap with definitions in the Data Dictionary and are noted as such where applicable.

The portal displays a list of default columns, and more can be added to the view by selecting the “Columns” drop down within the file browser.

Column

Description

Notes

File ID

The Kids First DRC unique identifier for the file.

These always start with GF_ for “Genomic File”

Participants ID

The Kids First DRC unique identifier for the participant.

These always start with PT_ for “Participant”

Study Name

See definition in Default Clinical Filters here.

 

Proband

See definition in Default Clinical Filters here.

 

Family ID

The Kids First DRC unique identifier for the family.

  • These always start with FM_ for “Family.”
  • All members of a family will have the same family ID.

Data Type

See definition in Default File Filters here.

 

File Format

See definition in Default File Filters here.

 

File Size

See definition in File Filters here.

 

File Download (No label - always last column)

If you have access to download a file, you will see a clickable down pointing arrow button. If you do not have access to download a file, you will see a lock icon indicating you do not have access.

Clicking on the down pointing arrow will download the file.

Participant External ID

The external ID of the participant provided by the investigator of the original study. This is a deidentified ID unique only within its given study.

 

File Name

The name of the file.

 

File External ID

The external ID of the file in the original study. This is often the file name.

 

Aliquot External ID

The external ID of the aliquot provided by the investigator of the original study. This is a deidentified ID unique only within its given study.

 

Sample External ID

The external ID of the sample/biospecimen provided by the investigator of the original study. This is a deidentified ID unique only within its given study.

 

Biospecimen ID

The Kids First DRC unique identifier for the biospecimen.

These always start with BS_ for “Biospecimen”

Tissue Type (Source Text)

See definition in Default Clinical Filters here.

 

Diagnosis (Source Text)

See definition in Default Clinical Filters here.

 

Study ID

The Kids First DRC unique identifier for the study.

These always start with SD_ for “Study”

Latest DID

The Gen3 Document ID (DID) of the file.

 

Search Facet Definitions

Below are the definitions and some general notes for the Kids First DRC Data Dictionary and search facets. As some of these definitions are common across many online genomic data resources, we have modeled some of our definitions after the Genomic Data Commons located here. Any definition noted with a [1] has come from GDC.

Default Facets

The portal provides two tabs of default facets to search over. The search terms have been divided into the categories of Clinical and File to help make searching easier for the user. The definitions below

Clinical Filters

Facet

Description

Notes

Study Name

The short name of the study derived from the Study Long Name.

Assigning short names for studies allows for easier searching on the portal.

Diagnosis Category

An overarching classification based on the Diagnosis Source Text to aide in quick searching over Cancer and Structural Birth Defects.

 

Diagnosis (Source Text)

Analysis, and recognition of the presence and nature of disease, condition, or injury from expressed signs and symptoms; also, the scientific determination of any kind; the concise results of such a study/investigation. [1]

Data as obtained directly from the original study/investigation.

Family Composition

A calculated value based on the family members present with genomic data within a participant’s pedigree.

  • Proband Only: Only the proband was sequenced:
  • Duo: The proband plus one parent was sequenced.
  • Duo+: The proband plus one parent plus other family members were sequenced (The two other members are not both biological parents. Ex: A proband, mother, and sister.)
  • Trio: The proband, their mother, and their father were sequenced.
  • Trio+: The proband, their mother, their father, and other family members were sequenced. (Ex: A proband, a mother, a father, and a sister)

Proband

The participant serving as the starting point for enrollment into study, often the first family member seeking medical attention.

 

Gender

Text designations that identify gender. Gender is described as the assemblage of properties that distinguish people on the basis of their societal roles. This value is self-reported and may come from a form, questionnaire, interview, etc. [1]

 

Race

An arbitrary classification of a taxonomic group that is a division of a species. It is characterized by shared hereditary, physical attributes and behavior, and in the case of humans, by common history, nationality, or geographic distribution. [1]

 

Tissue Type (Source Text)

Text term directly from the original investigation that represents a description of the kind of tissue collected with respect to disease status or proximity to tumor tissue. [1]

Data as obtained directly from the original study/investigation.

 

File Filters

Facet

Description

Notes

Experiment Strategy

The type of sequencing experiment performed on the biospecimen.

  • Whole Genome Sequencing (WGS)
  • RNA Sequencing (RNA-Seq)
  • Micro RNA Sequencing (miRNA-Seq)

Harmonized Data

Indicate if the file has been harmonized to the Kids First DRC standards so that it can be used alongside any other file in the DRC from any other study and sequencing center.

Kids First DRC harmonization pipelines currently align to GRCh38. Read more about harmonization here.

Data Type

The high level type of data contained within the file.

 

File Format

The technical specification of the file, typically the file extension of the file.

 

Family Shared Data Types

All members in the family share this data type.

 

 

All Filters

File

Facet

Description

Notes

ACL

The access control list value assigned to the file based on data access committee (DAC) authorization. This obtained from dbGaP for NIH based datasets.  

 

Availability

Value assigned that indicates whether or not the file is available for download within the DRC. This value does not take into account individual user’s permissions.

For potential future use, all data on the portal are currently available for download so this field is not populated.

Access

The type of authorization required in order to download the file. Open Access files are available to any registered portal user. Controlled access files require approval & authorization by dbGaP or the controlling DAC.

For more information on Data Access, please see the Support page here.

Created At

The date that the file was created in the DRC’s system.

For potential future use, currently not populated with data.

Data Type

See definition in Default File Facets here.

 

File Format

See definition in Default File Facets

 

File Name

The exact name of the file.

 

Harmonized Data

See definition in Default File Facets here.

 

Modified At

The date that the file was last modified in the DRC’s system.

For potential future use, currently not populated with data.

Reference Genome

The reference genome by which the sequencing experiment was run against.

Unharmonized files from the various sequencing centers use various different reference genomes depending on when and where they were sequenced. The DRC harmonized files are aligned to GRCh38.

File Size

The measure of the size of the file in KB, MB or GB.

 

 

Participant

Facet

Description

Notes

Alias Group

 

For potential future use, currently not populated with data.

Available Data Types

The File data types available for the participants.

See Data Types definition under Files for a more detailed Data Type definition.

Consent Type

The informed consent type that the participant agreed to at the time of sample collection. This is used to inform any data use limitations.

 

Ethnicity

An individual’s self-described social and cultural grouping, specifically whether an individual describes themselves as Hispanic or Latino.

 

Gender

See definition in Default Clinical Filters here.

 

Proband

See definition in Default Clinical Filters here.

 

Race

See definition in Default Clinical Filters here.

 



Biospecimen

Facet

Description

Notes

Age at Event (Days)

The age of the participant in days that the biospecimen was collected.

 

Analyte Type

Text term that represents the kind of molecular specimen analyte. [1]

 

Anatomical Site (Source Text)

The source text from the investigator that describes the disease site of the submitted sample. [1]

Data as obtained directly from the original study/investigation.

Composition

Text term that represents the cellular composition of the sample. [1]

 

Concentration (mg/ML)

Numeric value that represents the concentration of analyte or aliquot extracted from the sample or sample portion, measured in milligrams per milliliter. [1]

 

NCIt ID Tissue Type

The National Cancer Institute Thesaurus (NCIt) ID associated with the Tissue Type (Source Text) value. Derived by matching the Tissue Type Source Text with NCIt lookups.

 

NCIt ID Anatomical Site

The National Cancer Institute Thesaurus (NCIt) ID associated with the Anatomical Site (Source Text) value. Derived by matching the Anatomical Site Source Text value with NCIt lookups.

 

Participant’s Biospecimens Dbgap Consent Code

The dbGaP-assigned consent code used for access granting that is derived directly from the participant’s consent.

See Consent Type under participant for further definitions on consent.

Shipment Date

Date the biospecimen was shipped to the sequencing center.

For potential future use, currently not populated with data.

Shipment Origin

Location/institution from where the biospecimen was shipped.

For potential future use, currently not populated with data.

Spatial Descriptor

Term to indicate precise, relative anatomical position from where the biospecimen was obtained.

For potential future use, currently not populated with data.

Tissue Type (Source Text)

Text term directly from the original investigation that represents a description of the kind of tissue collected with respect to disease status or proximity to tumor tissue. [1]

Data as obtained directly from the original study/investigation.

Uberon ID Anatomical Site

The Uberon ID associated with the Anatomical Site (Source Text) value. Derived by matching the Anatomical Site Source Text value with Uberon lookups.

 

Volume (uL)

The volume in microliters of the analytes derived from the analyte(s) shipped for sequencing and characterization. [1]

 



Diagnosis

Facet

Description

Notes

Age at Event (Days)

The participant’s age in days that they were diagnosed with the disease.

 

Diagnosis

A calculated rollup of a participant’s diagnoses. If the participant has only Cancer diagnoses, the participant’s value is Cancer. If the participant has only Structural Birth Defect diagnoses, the value is Structural Birth Defects.

 

Diagnosis (Source Text)

See definition in Default Clinical Filters here.

Diagnosis Category

See definition in Default Clinical Filters here.

 

ICD ID Diagnosis

ICD10 code for the diagnosis.

For potential future use, currently not populated with data.

Mondo ID Diagnosis

The Mondo ID associated with the Diagnosis (Source Text) value. Derived by matching the Diagnosis Source Text value with Mondo ID lookups.

 

NCIt ID Diagnosis

The National Cancer Institute Thesaurus (NCIt) ID associated with the Diagnosis (Source Text) value. Derived by matching the Diagnosis Source Text value with NCIt lookups.

 

Spatial Descriptor

Term to indicate precise, relative anatomical position of the diagnosis.

For potential future use, currently not populated with data.

Tumor Location (Source Text)

Text term from the investigator that describes the anatomic site of the tumor. [1]

Data as obtained directly from the original study/investigation.

 

Family

Facet

Description

Notes

Family Composition

See definition in Default Clinical Filters here.

 

All other Family facets are a derivative of definitions above. The Family facets are files that belong to all members within a family.

 

Outcome

Facet

Description

Notes

Age at Event (Days)

Participant’s age in days of the Outcome event.

 

Disease Related

Text value describing whether or not the participant’s outcome is related to their disease. For example, whether their deceased status was due to their disease.

 

Vital Status

The survival state of the participant.

 

 

Phenotype

Facet

Description

Notes

Age at Event (Days)

Participant’s age in days that the phenotype was observed.

 

Ancestral HPO IDs

The Human Phenotype Ontology value associated with the Participant Phenotype (Source Text) value. Derived by matching the Phenotype Source Text value with HPO lookups.

 

External ID

External ID provided by the investigator of the original study for the Phenotype observation.

 

HPO Phenotype Observed

Files for which the HPO ID was positively observed.

 

Participant Phenotype (Source Text)

The observable characteristics in a participant resulting from the expression of genes, environment factors, and their interactions.  

Data as obtained directly from the original study/investigation.

Participants Phenotype HPO - HPO Phenotype Not Observed

Files for which the HPO ID was negatively observed.

 

Participants Phenotype HPO - Snomed Phenotype Not Observed

Files for which the Snomed value associated with the Participant Phenotype (Source Text) was negatively observed. Derived by matching the Phenotype Source Text value with Snomed lookups.

 

Participants Phenotype HPO - Snomed Phenotype Observed

Files for which the Snomed value associated with the Participant Phenotype (Source Text) was positively observed. Derived by matching the Phenotype Source Text value with Snomed lookups.

 

 

Study

Facet

Description

Notes

Data Access Authority

The authoritative group responsible for providing access.

 

Release Status

The status of the study within its Data Access Authority.

For potential future use, currently populated with “Pending”.

Study Long Name

The name of the study for which the original sample was sequenced and researched. Samples and participants are organized in the portal by their originating study.

IDs are the Kid First ID which starts with SD_*. If the study is a dbGaP study, the external ID for the study will be the PHS accession number.

Study Name

See definition in Default Clinical Filters here.

Version

The dbGaP version of the study.

 

 

Sequencing Experiment

Facet

Description

Notes

Experiment Date

Date the sample was sequenced.

 

Experiment Strategy

See definition in Default File Filters here.

Instrument Model

The model of the sequencer used to obtain data.

 

Is Paired End

Are there paired ends? [1]

 

Library Name

Name of library. [1]

 

Library Strand

Library stranded-ness. [1]

 

Max Insert Size

Max number of bases found between paired-end adapters.

 

Mean Insert Size

Mean number of bases found between paired-end adapters.

 

Mean Read Length

Mean length of the sequenced fragments [1]

 

Platform

Name of platform used to obtain data. [1]

 

Total Reads

Total number of reads from the sequencing experiment.