To consider the state of national and international collaboration, scientific research, and cloud-based access in the year 2020 is to marvel at the spirit of innovation, commitment to big goals, and ability to remain flexible that has been shown across the pediatric medical community. As this unique and challenging year continues on, scientists and clinicians, patients and families, research coordinators and medical teams, academic and agency leadership, and countless others have remained agile in moving the needle forward on research efforts that can positively impact the health and wellness of future generations.
At the NIH Common Fund-supported Gabriella Miller Kids First Data Resource Center (Kids First DRC), no less could be said for the investigators, bioinformaticians, data engineers, and coordinators who have delivered an incredible wealth of resources towards the study childhood cancer and structural birth defects, and advanced collaboration, interoperability, and scientific discovery across the National Institutes of Health and other pediatric research efforts.
Since our last reporting in April, 141 TB of Whole Genome Sequence (WGS) and Whole Exome Sequence (WXS) data across four datasets have been added to the Kids First Data Resource Portal. These data represent over 1,400 study participants and more than 400 families experiencing childhood cancers and orofacial cleft disorders. An all-new Portal support website was launched to better assist users in navigating the data request process and better understand the tools, resources and features available through the Portal. New analysis tools, including a HPO Ontology Browser, were launched. And a number of collaborative efforts have developed further.
New Data
In mid-June, a dataset from the Kids First study on Familial Leukemia was released onto the portal, containing over 55 TB of WGS data, consisting of aligned reads, unaligned reads and variant calls from 365 study participants and 56 families. Tissue samples were composed of both tumor and normal tissue collected primarily from trio and trio+ family groups. This study is led by Dr. Charles Mulligan of St. Jude Children’s Research Hospital. Information about this study can be found here.
On June 23, a dataset from Dr. Azeez Butali of the University of Iowa and Dr. Terri Beaty of Johns Hopkins University was released onto the portal, related to their Kids First study on Orofacial Cleft patients of Asian and African Ancestry. The dataset consists of more than 17 TB of WGS aligned reads and gVCF data from 725 study participants and 244 families. All tissue samples collected were from patients experiencing Cleft lip, Cleft palate, and Hypertelorism and their parents. The addition of these data has expanded the collective size and diversity of Orofacial Cleft data accessible through the Data Resource to more than 136 TB, from 2,824 patents and 985 families of European, Latin American, Asian, and African descent, presenting investigators the world over with the unique opportunity to examine these structural birth defects on a much broader representation of the global human population than ever before possible. More information about this study can be found here.
On July 21 of this year, a dataset associated Dr. Sharon Plon’s Kids First study to identify novel cancer susceptibility mutations from unselected childhood cancer patients and parent trios was released onto the Kids First Data Resource Portal containing more than 54 TB of WGS data, including aligned and unaligned reads, variant calls, and gVCF data from 291 study participants and 114 families. These data represent more than 65 childhood cancer diagnoses, with normal tissue collected entirely from trio family groups. Learn more about Dr. Plon’s study here.
Finally, On August 24, the Kids First Data Resource Portal released a dataset from Dr. Kenan Onel of the University of Chicago, related to his Kids First study, An Integrated Clinical and Genomic Analysis of Treatment Failure in Pediatric Osteosarcoma. The dataset contains 14.16 TB of aligned read WGS and WXS data from 84 study participants. These data are related to tumor, metastasis, normal, and relapse tissue from children experiencing a diagnosis of osteosarcoma. Learn more about the study here.
Expanding Our Tools and Resources
In May, the Kids First DRC was proud to launch its new HPO Ontology Browser , which allows users to more easily search phenotypic data found across datasets within the Data Resource Portal. Harmonized using Human Phenotype Ontology standards, users can now examine a more complete hierarchy of ontologies, providing a clearer view of the relationships between higher and lower phenotypic levels than was previously possible. Using this tool, data within the portal will be more useful for searching and cohort selection, analytics, and mechanism discovery. Users are now able to search by a number of categories including phenotypic abnormality, clinical course, mode of inheritance, blood group, past medical history, and more.
Also in May, our Scientific Community Outreach team launched an all-new support website to assist users of the Kids First Data Resource Portal. Called the Kids First DRC Help Center, the site has completely updated our suite of user support materials, including answers to frequently asked questions, guides for applying to data access and pulling data for analysis, how-to’s for using analysis tools and other features of the portal to their fullest potential, study updates, and much, much more!
To complement the release of these new support resources, our newly-appointed Kids First Scientific Community Program Manager, Dr. David Higgins, recently launched our monthly Kids First User Support Open Office Hours. These hour-long conversations, available via the Bluejeans phone and video conferencing platform, allow researchers to interface in real-time with Kids First DRC Principal Investigators, bioinformaticians, and data experts; to explore topics of importance in data analysis and research and to troubleshoot questions related to use of the Data Resource Portal, CAVATICA, and PedcBioPortal .
All users are invited to attend, from beginners to those who are more advanced. Interested researchers can contact David Higgins at higginsd@email.chop.edu for details on how to join each session by phone or video chat.
Finally, via GitHub, our bioinformatics experts released a number of helpful applications and tools, including an end-to-end pipeline running tutorial, a full synchronization between the GitHub repository and Cavatica, and a number of upgrades and minor enhancements to our alignmentworkflows.
Collaboration and Interoperability
The Kids First DRC, in collaboration with a number of other NIH programs and initiatives, has also made significant progress on a number of fronts including cloud-based interoperability and precision modeling.
Fast Healthcare Interoperability Resources (FHIR) is an interoperability specification for the secure exchange of Electronic Health Information, developed by Health Level Seven International (HL7). The aim of FHIR is to address the growing digitization of the healthcare industry and the need for patient records to be readily “available, discoverable, and understandable.” However, these standards can also be utilized to model and facilitate the exchange of clinical data for research purposes.
In light of ongoing NIH Cloud-Based Platforms Interoperability (NCPI) technical efforts (of which the Kids First DRC is a part), new efforts are underway that aim to prototype FHIR and create a roadmap for clinical data interoperability between datasets hosted by NCPI participants including the NIH Common Fund Kids First Data Resource, NCI CRDC, the NHLBI BioData Catalyst, and the NHGRI AnVIL.
The Kids First DRC is in a unique position to foster end-user analysis across multiple NIH platforms given their charge for promoting interoperability across a diversity of pediatric datasets represented within the Kids First program. Additionally, they have been actively involved in NCPI working groups, NIH Research Authorization Service (RAS) development, and are already pursuing multiple trans-NIH use cases.
Currently, feasibility studies are underway for implementing common APIs, GA4GH standards, and FHIR-based information exchange methods in alignment with the NCPI working groups’ broad-based consensus efforts.
Progress on this front is on-going. Watch the Kids First DRC’s blog for further details of this effort, coming soon!
Keeping Up the Momentum
The Gabriella Miller Kids First Pediatric Research Program and Data Resource Center have a number of exciting new projects and partnerships in development, which we look forward to rolling out in the months ahead. These include compiling and growing our datasets, supporting additional studies, and collaborating with experts across the NIH landscape. Through continued development of this data resource, our experts and administrators are committed to supporting scientific discoveries that will bring us closer to better treatments and scientific insights for the benefit of children everywhere.
To get involved in these efforts, contact the Kids First DRC team at support@kidsfirstdrc.org. You can also visit our Quick-Start guide to creat your free Kids First Data Resource Portal account today!