Gabriella Miller Kids First Data Resource Center Updates – December 2021-title-h1
Gabriella Miller Kids First Data Resource Center Updates – December 2021
As many of us can report, it seems as though the past 12 months have passed by in a flash! 2021 was a year of great scientific progress, punctuated by COVID-19 related successes and setbacks, unprecedented levels of international data sharing, and renewed calls for open scientific collaboration from patient families and health advocates across the US and beyond.
This past year, the NIH Common Fund-supported Gabriella Miller Kids First Data Resource Center (Kids First DRC) also experienced a regular stream of new developments. Seven new studies were added to the Kids First Data Resource, representing more than 311TB of clinical and omic data derived from nearly 7,000 pediatric research participants. Kids First experts shared research progress at numerous national and international scientific conferences. And new tools and functional improvements, including our Explore Data tool, HPO Ontology Browser, Variant Search feature, and User Login framework were implemented.
The Kids First DRC is incredibly proud of the growth we’ve experienced in 2021. As we turn toward the start of a new year, the Kids First team remains committed to fostering collaboration and new discoveries across the pediatric research landscape.
Expanded Data Resources
Over the past several months, more than 153TB of new data have been added to the Kids First Data Resource, representing more than 1,250 families and 3,500 study participants. With the addition of these new resources, the portal has grown to encompass 30 datasets from studies in childhood cancers and structural birth defects, with more than 1.5PB of clinical and multi-omic data available to the research community.
September saw the addition of two new datasets, starting with the Kids First study in Myeliod Malignancies. Titled “Germline and Somatic Variants in Myeliod Malignancies in Children”, this X01 Study is led by Dr. Soheil Meshinchi of the Fred Hutchinson Cancer Research Center. Comprising of both tumor and normal tissue donated by 408 study participants, the dataset from this study contains 1,632 whole genome sequences (WGS) and 2,730 RNA sequences (RNA-Seq) generated primarily by the Kids First sequencing center at the HudsonAlpha Institute. 37.86TB of data, including Aligned Reads, Gene Expression, Gene Fusions, Variant Calls, and gVCFs are currently available by request.
September’s second addition to the Kids First portal was data from the Kids First X01 Study in Leukemia and Heart Defects in Down syndrome. Led by Drs. Philip Lupo and Stephanie Sherman of the Baylor College of Medicine, the study is titled “Genomic Analysis of Congenital Heart Defects and Acute Lymphoblastic Leukemia in Children with Down Syndrome”. 2,037 study participants contributed tumor and normal tissue for this study, which was sequenced primarily by the Kids First sequencing center at the Broad Institute. 8,749 whole genome sequences were generated for this study, totaling 48.19TB of data. These data include Simple Nucleotide Variations, Aligned Reads, gVCF, Somatic Copy Number Variations, and Somatic Structural Variations.
In November, the Kids First study in T-cell Acute Lymphoblastic Leukemia (T-Cell ALL) was added to the portal. Led by Dr. David Teachey of the Children’s Hospital of Philadelphia, the study is titled “Comprehensive Genomic Profiling to Improve Prediction of Clinical Outcome for Children with T-cell Acute Lymphoblastic Leukemia”. With 2,276 whole genomes sequenced at the Broad Institute, this dataset contains 67.89TB of data derived from 1,138 study participants. Data include Aligned Reads and Variant Calls.
At the end of November, the Kids First DRC announced a key upgrade to the Data Resource Portal, in the form of a new interface and updated framework for logging in to access and search data. The upgrade enabled logins using the NIH Research Auth Service (RAS) for improved interoperability and continued security of the data assets housed within the Kids First Data Resource.
By enabling the use of RAS login credentials, the Kids First DRC gives investigators across the NIH easier access to the Data Resource portal. This upgrade also improves interoperability with other NIH programs and ICs, allowing Kids First users to access and more easily combine datasets within those external NIH environments.
Also, this quarter, data engineers at the Kids First DRC vastly expanded our Variant Search tool. Data from 13 additional studies have been uploaded to the Variant Search, bringing the total number of available datasets from 6 to 19.
With the Variant Search tool, Kids First users can perform highly refined queries of genomic data as well as clinical data. With the ability to seek out specific genes and gene variants not only within a single disease area or dataset but across diagnoses and cohorts, users can explore similar underlying causes of cancers, structural birth defects, and other rare disorders. By defining similar factors that may underlie two or more seemingly unrelated diagnoses, investigators may discover commonalities in pathogenesis or novel treatment strategies.
The Data Resource team anticipates the addition of more data to the Variant Search tool over the next several months.
Kids First Partnerships
Efforts to upgrade the Kids First login framework are part of a broader effort on the part of the Kids First Data Resource to support the NIH Cloud Platform Interoperability (NCPI) initiative. This collaborative effort seeks to support new discoveries in medical science by creating a federated genomic data ecosystem, easily accessible to investigators across a wide variety of research specialties.
By enabling login credentials applicable to a broader cross-section of research disciplines across the NIH, Kids First hopes to support cross-collaboration and new discoveries across disease areas, in both children and adults.
Users of the Kids First Data Resource have more ways than ever to seek help in navigating the Portal and using our analysis tools to their fullest ability.
Aside from our regular Office Hours sessions, held virtually on the second Tuesday of each month from 3:00pm to 4:00pm/ET, as well as on-demand concierge support by our team of data experts, users can also find a wealth of valuable information at the Kids First DRC Help Center and User Support Forum.
At the Help Center, find answers to frequently asked questions, review release notes for Kids First Studies, access how-to guides for searching and analyzing data, and more!
On the Support Forum, users can find latest technical announcements from the Kids First data team, pose questions, engage in dialog with other portal user, and search conversation threads to get the helpful information they need.
To learn more about each of these support resources, contact our Scientific Community Program Manager, Dr. David Higgins, at firstname.lastname@example.org.
Inviting New Collaborations
The Kids First DRC is continuously working to forge new collaborations and assist investigators around the world and across the childhood cancer and structural birth defect research landscape in maximizing the discovery potential of their scientific efforts. Whether partnering with other research efforts or working with researchers to optimize the user experience, Kids First DRC experts and administrators are committed to supporting efforts to develop better treatments and scientific insights for the benefit of children everywhere.