Improvements in Gene Panel and Whole Exome sequencing
Molecular barcoding refers to the attachment of a unique molecular sequence (eg. ATGCCA) to each individual DNA fragment in a sample. This ensures that in downstream analysis, unique sequence reads can be separated from those originating as a result of PCR duplication.
A common practice regarding these PCR duplicates is to search for identical sequences in datasets, and remove those that were already recorded. It assumes that such reads have been created from the same DNA molecule by PCR. This assumption may be flawed, especially with ever higher sequencing throughput increasing the chance of eliminating identical reads that in fact originate from different DNA molecules.
What are the advantages of separating PCR artifacts from real variants present in the original molecules?
- Deduplication based on UMIs is more accurate than solely based on removal of identical reads
- Enables variant detection at a much lower Variant Allele Frequency (VAF)
- Significant reduction of the false positive rate, thereby increasing the sensitivity
- Enables ‘Read Correction’ by making use of duplicate reads
- Improves Copy Number Variant (CNV) or haplotyping analysis, since UMIs represent the most accurate method to perform computational cleanup of data
Enzymatic shearing replaces mechanical shearing due to concerns about costs, throughput and yield. It also eliminates the risk of sample swaps since there are no true high-throughput options for mechanical shearing available. However, enzymatic shearing leads to slightly less random fragment ends, which can be effectively counteracted by inclusion of UMIs.
Reads that are identified by UMIs as true duplicates and originate from the same original molecule can be set to good use. These discarded reads can be used for error correction. As shown below, PCR and sequencing errors that occur at low frequency can be identified. A consensus sequence can be generated using multiple true identical reads (see figure below). This is common practice in bioinformatics analysis, but could not be effectively performed before the addition of UMIs in the wetlab.
UMIs are part of the GenomeScan DNA-seq portfolio
For Whole Exome Sequencing (WES) and targeted gene-panels, the GenomeScan expert team foresees major advances in oncology and pathology studies that report on low frequency variants. For clinical geneticists, UMIs represent a means for optimizing sensitivity and specificity in WES-based diagnostics and in the development of robust CNV data analysis pipelines. We are implementing the use of UMIs in all of our DNA sequencing workflows:
- Microbiome analysis
- Whole Genome Sequencing (WGS)
- Gene panel seqencing (Launch Q3 2019)
- Whole Exome Sequencing (WES)*
(*UMIs implemented in 2020)
Do you work with any of the above methods? Tell us about your project and let’s see how your experiment can benefit from the use of UMIs.