The NIH Common Fund Data Ecosystem

The Common Fund supports a number of Data Coordinating Centers (DCCs), such as the Kids First Data Resource Center, that provide curated data derived from hundreds of studies and samples collected from thousands of human subjects. An incredible diversity of datatypes has been generated at the genomic, expression, proteomic, metagenomic, and imaging levels, and the DCCs support a tremendous range of scientific discovery efforts.

However, the present ability of a clinical or biomedical researcher to use the resources generated by the Common Fund is poor. It is difficult to search across all of the Common Fund data sets, and the resources are not readily usable in combination. The individual DCCs also need support for enhanced protected data access, long term data storage, training, interconnection with flexible data analysis platforms, and data and data portal availability past the end of the Common Fund Program lifecycle.

The Common Fund Data Ecosystem (CFDE) was established in early 2019 to address challenges faced by end users as well as the DCCs themselves. To assist the Common Fund DCCs, the CFDE supports individual DCC needs with targeted investments in interoperability, authentication/access to protected data, training, program lifecycle support, and evaluation of practical barriers to data Findability, Accessibility, Interoperability, and Reusability (FAIR). The CFDE also coordinates a monthly virtual “cross-pollination” seminar to connect DCCs across the Common Fund and beyond.

A key investment by the CFDE is in cross-DCC data discovery. Each of the DCCs host many assets (data files) – e.g., genomic sequence, metagenomic data, RNA-seq, physiological and metabolic data – and it is hard to discover these assets across DCCs. Moreover, information describing the contents of the files is not available in a standardized format. This prevents DCCs from making use of each other’s data, makes the data less discoverable by others, and challenges interoperability. To improve federation, the CFDE has created a central portal with a collection of inventories derived from data that are being hosted by the DCCs. The portal is still under development, but it will eventually describe all the assets at each DCC and make them discoverable via this centralized interface.

The advantage of this approach is that formation of the ecosystem does not require the data assets themselves be available via a central repository: only the inventories describing those assets are centralized. Cataloging all of the Common Fund assets is a simple and effective means of liberating data from what would be many siloed repositories, and therefore greatly increases the FAIRness of all Common Fund data. This form of data federation can also be extended to programs funded by other institutes, and easily linked to other NIH ecosystems: once an inventory system is available, it can be used by anyone.

The CFDE is also working with Seven Bridges Genomics to connect the portal to their Cavatica platform, in order to support custom data analysis workflows. Cavatica is a Seven Bridges product that provides a user-friendly interface suitable for beginner and intermediate level users to conduct bioinformatics analysis with Kids First data. Cavatica provides a graphical user interface to easily access Kids First data or import files for use in a visual editor that enables customizable analysis workflows using a point and click interface. The Cavatica workbench is designed to be used by clinicians or non-bioinformatics researchers who may not be well versed in command line or software programming. For more advanced users with programming experience, Cavatica also offers the ability to construct new tools and pipelines.

Developers at Cavatica are currently funded under the auspices of the CFDE to tie their interface directly to the CFDE portal. Initial implementation for this system is expected by the end of 2021, and will be designed to enable users to create shopping cart lists of data from the Common Fund DCCs, import those files into the Cavatica workbench, and to perform analysis using their system.

The CFDE is also building a training program in partnership with Kids First and the other Common Fund DCCs to enable end users to make use of the CF data sets, in order to accelerate basic and clinical research. This training program, available at https://training.nih-cfde.org/, will support a wide range of users with guides to using CFDE technologies as well as specific DCCs. Our existing training includes a CFDE portal guide as well as information on how to use the Kids First portal, and will soon be expanded to include data analysis on Cavatica.

The NIH Common Fund Data Ecosystem

Previous PostKids First DRC Program Updates - February 2021

Next PostINVESTIGATOR SPOTLIGHT: Nicole Vasilevsky, PhD

About

Resources

News

Kids First Partner Institutions

Cloud Credits Inquiry

Kids First: Congenital Diaphragmatic Hernia
Kids First: Congenital Heart Defects
Kids First: Ewing Sarcoma - Genetic Risk
Kids First: Orofacial Cleft - European Ancestry
Kids First: Syndromic Cranial Dysinnervation
Kids First: Adolescent Idiopathic Scoliosis
Kids First: Disorders of Sex Development
Kids First: Orofacial Cleft - Latin American
Kids First: Neuroblastoma
Kids First: Enchondromatoses
Kids First: Familial Leukemia
Kids First: Orofacial Cleft - African and Asian Ancestry
Kids First: Novel Cancer Susceptibility in Families (from BASIC3)
Kids First: Osteosarcoma
Kids First: Craniofacial Microsomia
Kids First: Kidney and Urinary Tract Defects
Kids First: Microtia - Hispanic
Kids First: Intersections of Cancer & SBD
Kids First: Esophageal Atresia and Tracheoesophageal Fistulas
Kid First: Hemangiomas (PHACE)
Kids First: Nonsyndromic Craniosynostosis
Kids First: Myeloid Malignancies
Kids First: Leukemia & Heart Defects in Down Syndrome
Kids First: T-Cell ALL
Kids First: Cornelia de Lange Syndrome
Kids First: Bladder extrophy, Epispadias, Complex
Kids First: Laterality Birth Defects
Kids First: CHARGE Syndrome
Kids First: Orofacial Clefts - Philippines
Kids First: Fetal Alcohol Spectrum Disorders
Kids First: Intracranial Germ Cell Tumors
Kids First: Structural Defects of The Neural Tube
Kids First: Recessive Structural Brain Defects
Kids First: Chromosome 18 Structural Birth Defects
Children's Brain Tumor Network (CBTN)
Kids First: Whole genome sequencing studies of multiplex nonsyndromic cleft lip/palate families

The NIH Common Fund Data Ecosystem

Previous PostKids First DRC Program Updates - February 2021

Next PostINVESTIGATOR SPOTLIGHT: Nicole Vasilevsky, PhD

Related Posts

A Decade of Kids First

The Rare We Share

Uniting Genomics for Kids at ASHG 2025

About

Resources

News

Kids First Partner Institutions