If you start analyzing your gene-expression data set, many questions come to mind that can seriously affect the outcome.
- What if any of the PCR steps during library prep (Figure 1) has introduced an error? It might have led to a base change, which I could incorrectly score as a mutation.
- Did a PCR-bias favor amplification of some of the DNA sequences? I would obtain a higher number of copies of some genes compared to others, resulting in an inaccurate fold-change measurement.
- What if two exactly the same DNA sequences are not duplicates? I would erroneously discard the supposed duplicate read.
Fortunately, all these questions can be solved with the addition of a small but crucial element: Unique Molecular Identifiers (UMI’s). Not only in RNA sequencing these UMI’s increase the quality of your data set, but also in very similar ways, in DNA-sequencing.
Figure 1: Gene expression profiling workflow.
Main steps in this process are RNA extraction from cells, conversion into cDNA, amplifying all molecules during library amplification by PCR, and finally sequencing to generate the expression profile.
What are Unique Molecular Identifiers?
Unique Molecular Identifiers or UMIs are short sequences like “CATGAC” that tag individual molecules in your original sample. They act as a barcode (Figure 2), helping us to determine whether a sequence arises from distinct molecules, or from PCR amplification. UMIs provide the highest levels of error correction and accuracy.
Figure 2: Example of UMI adapter and schematic representation of the usage of UMIs.
Tagging each seemingly identical molecule with a barcode helps differentiating their copies after PCR.
How does your NGS project benefit from UMIs?
In gene-expression profiling studies, removal of true duplicates by UMIs will ensure a superior representation of the transcripts. This outperforms standard duplicate removal later during data analysis, especially in profiling of small cell populations where the availability of input material is low, such as in liquid biopsies.
The largest advances for UMIs in DNA-based studies is error correction and reduction of the false-positive rate, from single gene panels to Whole Exome Sequencing (WES) projects. True duplicates can be used to correct PCR and sequencing errors, so pathologists and oncologists can reliably report on low frequency variants. Data-analysis pipelines that include corrections using UMIs can more reliably detect Copy Number Variations (CNVs). Overall, UMIs contribute to a higher level of sensitivity and specificity of all NGS-based methods.
With the appropriate bioinformatics analysis the quantification is simplified and more robust. Our GenomeScan bioinformatics experts have developed and validated pipelines to maximize the use of UMIs to improve data quality.
In our laboratory, the team is currently testing and implementing UMIs for all services (*UMIs implemented in 2020):
- Gene-expression profiling
- Total transcriptome
- Small RNAseq*
- Ultra-low input transcriptomics
- Whole Genome Sequencing (WGS)
- Whole Exome Sequencing (WES)*
- Gene panel sequencing
- Microbiome analysis
Do you work with any of the above methods? Tell us about your project and let’s see how your experiment can benefit from the use of UMIs.