Clinical Data Harmonization
Clinical and phenotypic data from each research study contains demographic and condition-specific information. This information is harmonized using community- based ontologies and standards such as the Human Phenotype Ontology (HPO) and/or the NCI Thesaurus (NCIt). The goal of this harmonization is to allow for easier analysis across different datasets and disease types, both within the Kids First Data Resource Center as well as across other genomic datasets. The harmonization process is iterative with the goal of addressing scientific use cases. To highlight the scope of this endeavor, learn more: About the Research.
The goal of genomic harmonization is to provide an “analysis-ready” dataset that is “functionally equivalent” both across the Kids First datasets and other large-scale genomic data initiatives. As such, the initial pipelines are based on best practices for joint genotyping for genetic variation. We are currently investigating a number of somatic callers in the context of pediatric cancer and anticipate providing multiple pipelines for somatic calls. Structural variant, as well as RNA-seq based quantification, are on the near-term roadmap.