SINE-VNTR-Alu (SVA) retrotransposons are evolutionarily young and
still-active transposable elements (TEs) in the human genome. Several
pathogenic SVA insertions have been identified that directly mutate host
genes to cause neurodegenerative and other types of diseases. However, due
to their sequence heterogeneity and complex structures as well as
limitations in sequencing techniques and analysis, SVA insertions have been
less well studied compared to other mobile element insertions. Here, we
identified polymorphic SVA insertions from 3646 whole-genome sequencing
(WGS) samples of >150 diverse populations and constructed a polymorphic SVA
insertion reference catalog. Using 20 long-read samples, we also assembled
reference and polymorphic SVA sequences and characterized the internal
hexamer/variable-number-tandem-repeat (VNTR) expansions as well as
differing SVA activity for SVA subfamilies and human populations. In
addition, we developed a module to annotate both reference and polymorphic
SVA copies. By characterizing the landscape of both reference and
polymorphic SVA retrotransposons, our study enables more accurate
genotyping of these elements and facilitate the discovery of pathogenic SVA
insertions.