Exome sequencing is widely used in genetic studies of human diseases and
clinical genetic diagnosis. Accurate detection of copy number variants
(CNVs) is important to fully utilize exome sequencing data. However, exome
data are noisy. None of the existing methods alone can achieve both high
precision and recall rate. A common practice is to perform heuristic
filtration followed by manual inspection of read depth of putative CNVs.
This approach does not scale in large studies. To address this issue, we
developed a transfer learning method, CNV-espresso, for in silico
confirming rare CNVs from exome sequencing data. CNV-espresso encodes
candidate CNVs from exome data as images and uses pretrained convolutional
neural network models to classify copy number states. We trained
CNV-espresso using an offspring-parents trio exome sequencing dataset, with
inherited CNVs as positives and CNVs with Mendelian errors as negatives. We
evaluated the performance using additional samples that have both exome and
whole-genome sequencing (WGS) data. Assuming the CNVs detected from WGS
data as a proxy of ground truth, CNV-espresso significantly improves
precision while keeping recall almost intact, especially for CNVs that span
a small number of exons. CNV-espresso can effectively replace manual
inspection of CNVs in large-scale exome sequencing studies.