% intron reads matter in single-cell RNA sequencing data. Why?

10.11.2025

10’

When eukaryotic cells express DNA, DNA is transcribed into pre-messenger RNA (mRNA), which must then be processed into a mature, functional mRNA transcript. These changes increase mRNA stability but can also be a major source of mRNA diversity. Alternative splicing, the process of removing certain introns from pre-mRNA, produces diverse transcripts from a single gene¹.

Detecting introns in single-nucleus (snRNA-seq) and single-cell RNA sequencing (scRNA-seq) can uncover novel transcriptional processes such as intron retention (IR)². However, reads mapped to introns can also arise from biases produced during library preparation and sequencing.

Those reads may represent are sequencing artefacts.

There are two ways to address sequencing artefacts. They can either be minimized during library preparation or they can be removed with bioinformatics tools. Either way, ensuring that any intronic reads represent true signals will uncover the rich complexity that is transcriptional regulation in eukaryotic cells.

Biases from library preparation that enrich for intron-aligned reads

Understanding the technical biases that can artificially enrich intron-aligned reads is the first step towards correctly interpreting intron reads in scRNA-seq data. The following sources of bias can produce false positives when detecting introns:

DNA contamination: DNA can be found everywhere, whether inside cells or as extracellular DNA. When DNases fail to degrade the DNA before library preparation, they get amplified and sequenced alongside the complementary DNA (cDNA). In sequencing data, DNA contamination can falsely attribute higher abundances of intronic reads to nascent mRNA transcripts. Contaminant DNA can also mask true IR events in mRNA transcripts³.

Mispriming: Mispriming occurs when DNA primers bind to cDNA at regions of partial complementarity. In scRNA-seq, mispriming is prevalent, producing sequencing artefacts in scRNA-seq studies that can be mistaken as introns or other genomic features⁴.

Internal oligo(dT) priming: Smart-Seq^5–7, DropSeq⁸, and other library preparation techniques use internal oligo(dT) priming. Here, oligo(dT) probes hybridize with the poly(A) tail to prime reverse transcription and selectively amplify mature mRNA sequences. However, nascent mRNAs can also harbor genome-encoded poly(A) sequences that encode introns⁹. These sequences can artificially enrich RNA coverage for those regions¹⁰.

Incomplete fragmentation: Incomplete fragmentation can be identified by the presence of longer RNA fragments in the library. Longer fragments do not get sequenced, particularly in short-read sequencing, as a result. This can cause certain transcripts to be missed and others to be artificially enriched.

How correctly identifying introns can produce novel insights from single-cell gene expression data

When intron-assigned reads derived from technical biases are removed, the remaining intron-mapped reads can reveal the role that introns play in transcriptional regulation, genetic diversity, and human health. Various intron-related phenomena are associated with critical developmental and physiological processes, including:

RNA velocity: RNA velocity measures how fast mRNA transcripts are spliced and degraded. Correctly mapping nascent and mature mRNA transcripts in scRNA-seq data can reveal dynamic changes in mRNA expression among individual cells. This information, in turn, can aid the prediction of cell lineage differentiation¹¹ and explain the evolution of tumour microenvironments¹².

IR events: IR occurs when an intron remains in a mature RNA transcript. Retained i can reduce the expression of low-abundance transcripts that lack physiological relevance¹³. Differences in IR patterns have been implicated in various diseases, including Alzheimer’s disease¹⁴ and acute myeloid leukemia (AML)¹⁵

Splicing variants and single-nucleotide variants (SNVs): Splicing variants and SNVs are major sources of human genetic diversity. Like IR events, splicing variants arise from alternative splicing, where a single gene can produce different transcript variants. Genetic changes also give rise to SNVs. When located in intronic regions, they affect alternative splicing sites and modulate disease risk¹⁶.

How scientists should analyze sequences mapped to intronic regions

The validation of intron reads in single-cell transcriptomes requires in vitro and in vivo experiments to assess their function. Nevertheless, you can integrate the validation process throughout your transcriptomics pipeline by implementing the following aspects of your scRNA-seq workflow:

Consider whether to perform scRNA-seq or snRNA-seq

The nucleus is filled with pre-processed, unspliced transcripts. As a result, data generated with snRNA-seq tends to have more reads mapped to intronic regions. While snRNA-seq can characterize transcripts from hard-to-dissociate tissues, scRNA-seq can detect more genes for quantification¹⁷.

Be aware of the technical biases that can emerge in your data

We have discussed several sources of technical biases that can confound the elucidation of single-cell transcriptomes if left unaccounted for. For one, oligo(dT) priming produces 3’ coverage biases in poly(A) RNA libraries that affect transcript quantification and gene expression analyses¹⁸. This enrichment also makes introns located closer to the 5’ end of mRNA transcripts harder to detect¹⁸. The presence of doublets (learn more) can also produce reads incorrectly called as novel introns.

Include quality controls to minimize biases in your single-cell data

Quality controls (QCs) ensure that your single-cell transcriptomic workflows do not introduce unwanted technical biases. Singleron Biotechnologies makes it easy to adopt these QC parameters at every step of a scRNA-seq workflow, as shown below:

Tissue processing

In this step, tissues are dissociated to isolate single cells. When not carefully performed, tissue dissociation can cause cell stress and lysis. Singleron’s newest machine, the PythoN i, can incorporate several QC parameters that distinguish viable cells from non-viable cells and debris into a scRNA-seq workflow. Viability stains such as Trypan Blue, PI, and SYTO9 can distinguish debris from cells and viable cells from non-viable cells for counting. Using these QC parameters, the PythoN i was determined to isolate single cells from at 90% viability, regardless of cell type and phenotype *(Figure 1).

Figure1. Single cell dissociation results from different mouse organs using PythoN i.

Library preparation

Other QCs are performed after producing cDNA libraries for RNA-seq. For example, RNA integrity number (RIN) determines how degraded the RNA is. When RNA is severely degraded, gene expression data can become skewed, leaving true intronic signals difficult to detect. Singleron’s GEXSCOPE RNA Library Kit also includes a QC step with purified cDNA that is fully compatible with all platforms that assess cDNA concentrations and integrity, including the Qubit 4 Fluorometer and Agilent Fragment Analyzer 5200. These methods supply figures that depict fragment lengths for rRNA sequences (for the RIN number) and all other cDNA in the libraries (Figure 2). This ensures that any reads aligned to introns are derived from nascent mRNA transcripts in snRNA-seq data or in mature mRNA transcripts in scRNA-seq data.

Figure 2: Sample QC of a next-generation sequencing (NGS) library on the Agilent Fragment Analyzer 5200. The library was generated using the GexSCOPE single cell RNA library kit.

Bioinformatics

Even after acquiring sequencing data, removing low-quality reads and eliminating noisy data is essential for identifying unique cell lineages based on gene expression patterns (Read more here). QCs in bioinformatics pipelines include preprocessing steps that filter out low-quality reads derived from doublets and degraded RNA. It also implements feature selection to exclude genes with zero read counts, assigned as genes likely to have no biological relevance among the single cells being studied.

For each case, Singleron’s sequencing data processing pipeline, CeleSCOPE, includes each of these QC features to reliably count introns in mRNA transcripts and characterize the gene expression profiles of unique cells in complex tissues. All in all, after accounting for these biases, Singleron’s GEXSCOPE generated libraries with the lowest proportion of intronic reads when compared with six other scRNA-seq suppliers (Figure 3).

Figure 3. Percentage of reads mapped to exonic and intronic sequences for scRNA-seq kits from six vendors across replicates.

Conclusion: incorporate intron data with confidence with Singleron Biotechnologies

Introns may not encode a functional protein sequence, but they add an extra layer in the transcriptional regulation of the human genome. In scRNA-seq, the presence of intronic sequences creates an opportunity to discover how introns contribute to within-tissue cellular diversity in health and disease.

Singleron Biotechnologies has developed end-to-end products that will aid you when running a scRNA- or snRNA-seq protocol. Singleron’s proprietary pipeline can output true intronic sequences that you can trust. From tissue dissociation to bioinformatics analyses, Singleron’s array of machines and tools will output intron-aligned reads that you can be confident in. With those reads, you can glean novel insights into eukaryotic transcriptional regulation and elucidate the role that introns play in human physiology and disease.

References

1. Verta JP, Jacobs A. The role of alternative splicing in adaptation and evolution. Trends in Ecology & Evolution. 2022;37(4):299-308. doi:10.1016/j.tree.2021.11.010

2. Monteuuis G, Wong JJL, Bailey CG, Schmitz U, Rasko JEJ. The changing paradigm of intron retention: regulation, ramifications and recipes. Nucleic Acids Res. 2019;47(22):11497-11513. doi:10.1093/nar/gkz1068

3. Broseus L, Ritchie W. Challenges in detecting and quantifying intron retention from next generation sequencing data. Computational and Structural Biotechnology Journal. 2020;18:501-508. doi:10.1016/j.csbj.2020.02.010

4. Shivram H, Iyer VR. Identification and removal of sequencing artifacts produced by mispriming during reverse transcription in multiple RNA-seq technologies. RNA. 2018;24(9):1266-1274. doi:10.1261/rna.066217.118

5. Hagemann-Jensen M, Ziegenhain C, Chen P, et al. Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nat Biotechnol. 2020;38(6):708-714. doi:10.1038/s41587-020-0497-0

6. Picelli S, Faridani OR, Björklund ÅK, Winberg G, Sagasser S, Sandberg R. Full-length RNA-seq from single cells using Smart-seq2. Nat Protoc. 2014;9(1):171-181. doi:10.1038/nprot.2014.006

7. Ramsköld D, Luo S, Wang YC, et al. Full-Length mRNA-Seq from single cell levels of RNA and individual circulating tumor cells. Nat Biotechnol. 2012;30(8):777-782. doi:10.1038/nbt.2282

8. Macosko EZ, Basu A, Satija R, et al. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell. 2015;161(5):1202-1214. doi:10.1016/j.cell.2015.05.002

9. Tian B, Pan Z, Lee JY. Widespread mRNA polyadenylation events in introns indicate dynamic interplay between polyadenylation and splicing. Genome Res. 2007;17(2):156-165. doi:10.1101/gr.5532707

10. Svoboda M, Frost HR, Bosco G. Internal oligo(dT) priming introduces systematic bias in bulk and single-cell RNA sequencing count data. NAR Genom Bioinform. 2022;4(2):lqac035. doi:10.1093/nargab/lqac035

11. Zhang C, Fang Y, Chen W, et al. Improving the RNA velocity approach with single-cell RNA lifecycle (nascent, mature and degrading RNAs) sequencing technologies. Nucleic Acids Res. 2023;51(22):e112. doi:10.1093/nar/gkad969

12. Fan J, Slowikowski K, Zhang F. Single-cell transcriptomics in cancer: computational challenges and opportunities. Exp Mol Med. 2020;52(9):1452-1465. doi:10.1038/s12276-020-0422-0

13. Braunschweig U, Barbosa-Morais NL, Pan Q, et al. Widespread intron retention in mammals functionally tunes transcriptomes. Genome Res. 2014;24(11):1774-1786. doi:10.1101/gr.177790.114

14. Li H, Funk CC, McFarland K, et al. Integrative functional genomic analysis of intron retention in human and mouse brain with Alzheimer’s disease. Alzheimers Dement. 2021;17(6):984-1004. doi:10.1002/alz.12254

15. Dvinge H, Bradley RK. Widespread intron retention diversifies most cancer transcriptomes. Genome Med. 2015;7(1):45. doi:10.1186/s13073-015-0168-9

16. Xiong HY, Alipanahi B, Lee LJ, et al. The human splicing code reveals new insights into the genetic determinants of disease. Science. 2015;347(6218):1254806. doi:10.1126/science.1254806

17. Gupta A, Shamsi F, Altemose N, et al. Characterization of transcript enrichment and detection bias in single-nucleus RNA-seq for mapping of distinct human adipocyte lineages. Genome Res. 2022;32(2):242-257. doi:10.1101/gr.275509.121

18. Lee S, Zhang AY, Su S, et al. Covering all your bases: incorporating intron signal from RNA-seq data. NAR Genom Bioinform. 2020;2(3):lqaa073. doi:10.1093/nargab/lqaa073

A post by Salih Yilmaz

Check out our latest blog posts

Learn more

24.12.10

2024 in Review: A Year of Remarkable Scientific Breakthroughs

What a year 2024 has been for the scientific community – in particular in the field of immuno-oncology! Researchers using Singleron’s products have collectively published over 150 studies this year, each one pushing the boundaries of our understanding of diseases and molecular mechanisms. We are immensely proud to celebrate the dedication and success of these brilliant scientists whose work continues to inspire and drive progress in the field.

23.12.12

Annual Research Roundup: 2023's Most Impactful Publications!

2023 was a busy and successful year for our scientific community. As 2023 comes to an end, it is time to look back at some of theimpactful publications from this year.

23.11.22

Decoding the Biological Meaning of Your Data: The Power of Accurate Automated Cell Type Annotation

Automated single cell RNA sequencing annotation streamlines analysis, saving time and improving reproducibility. Exploreautomated ‘annotation approaches and key considerations in our latest blog