Skip to main content
search

Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis.

Graph-based genome reference representations have seen significant
development, motivated by the inadequacy of the current human genome
reference to represent the diverse genetic information from different human
populations and its inability to maintain the same level of accuracy for
non-European ancestries. While there have been many efforts to develop
computationally efficient graph-based toolkits for NGS read alignment and
variant calling, methods to curate genomic variants and subsequently
construct genome graphs remain an understudied problem that inevitably
determines the effectiveness of the overall bioinformatics pipeline. In
this study, we discuss obstacles encountered during graph construction and
propose methods for sample selection based on population diversity, graph
augmentation with structural variants and resolution of graph reference
ambiguity caused by information overload. Moreover, we present the case for
iteratively augmenting tailored genome graphs for targeted populations and
demonstrate this approach on the whole-genome samples of African ancestry.
Our results show that population-specific graphs, as more representative
alternatives to linear or generic graph references, can achieve
significantly lower read mapping errors and enhanced variant calling
sensitivity, in addition to providing the improvements of joint variant
calling without the need of computationally intensive post-processing
steps.

Close Menu