Decoding the Biological Meaning of Your Data: The Power of Accurate Automated Cell Type Annotation

22.11.2023

3’

One of the crucial steps in analyzing single cell RNA sequencing (scRNA-Seq) data is to perform cell type annotation. This involves determining the identity of each cell based on its transcriptomic profile. During the analysis, cell type annotation is typically carried out after dimensionality reduction and clustering. Performing cell type annotation helps distinguish different cell populations within a sample and provides valuable biological insight to improve data interpretation.

Approaches for cell type annotation

There are two main approaches for cell type annotation: manual and automated.

Manual cell type annotation

Manual cell type annotation is conducted using well-known and established biomarkers obtained from literature or databases. Researchers annotate cell types by visualizing and investigating the expression of these markers at the cluster/cell level to perform cell type annotation. However, this process requires biological knowledge and understanding, is very subjective, and prone to variation. In addition, it is time consuming. For a typical single cell dataset, it can take 20 to 40 hours to manually annotate 30 clusters.

Automated cell type annotation

To automate the process of cell type annotation, researchers have developed several computational tools following different principles, such as using marker genes and reference datasets (1). Below are a few examples.

1. Marker gene database-based: In this approach, a curated list of marker genes collated from several studies / cell atlases or databases are used. CelliD, a tool based on multiple correspondence analysis, uses marker genes to perform cell type annotation at the single cell level (2). CelliD is available on our SynEcoSys platform as part of the downstream analysis. Other tools, such as scCATCH and SCSA, use a scoring system based on marker expression and preform annotation at the cluster level.

2. Correlation-based: In this approach, labelled scRNA-seq or bulk reference datasets are used as reference. Tools such as SingleR and scmap-cell measure the similarity between the reference and query dataset using different correlation methods such as Spearman or Cosine distance methods. The labels of the cells in the reference dataset with high similarity/correlation is assigned to the cells in the query dataset.

3. Supervised classification based: Using machine learning algorithms, classifiers/models are trained using the labelled (annotated) reference scRNA-seq datasets. The trained classifiers or models are then applied on query datasets to predict the cell types. MapCell is such an example. It is a supervised cell type annotation tool built based on a Siamese neural network and few-shot training approach and can predict the cell type at the single cell level (3). Another tool, CellTypist, is based on logistic regression optimised by the stochastic gradient descent algorithm. It contains models trained on different tissues to perform automatic annotation (4).

Using automated tools, annotation can be done in a relatively short time, provides consistent results and increases reproducibility. Automated cell type annotation has become an indispensable component of single cell data analysis pipeline. However, the robustness of cell type annotation does rely on the gene markers and reference datasets used and requires careful validation and refinement.

Want to learn more about automated cell type annotation? Check out our upcoming Bioinformatics Bootcamp! Contact info@singleron.bio to learn more.

References

1. Pasquini, Giovanni, et al. “Automated methods for cell type annotation on scRNA-seq data.” Computational and Structural Biotechnology Journal19 (2021): 961-969.

2. Cortal, Akira, et al. “Gene signature extraction and cell identity recognition at the single-cell level with Cell-ID.” Nature biotechnology9 (2021): 1095-1102.

3. Koh, Winston, and Shawn Hoon. “MapCell: learning a comparative cell type distance metric with siamese neural nets with applications toward cell-type identification across experimental datasets.” Frontiers in Cell and Developmental Biology9 (2021): 767897.

4. Xu, Chuan, et al. “Automatic cell type harmonization and integration across Human Cell Atlas datasets.” bioRxiv(2023): 2023-05.

A post by Prabhakaran Munusamy

Check out our latests blog posts

Learn more

23.12.12

Annual Research Roundup: 2023's Most Impactful Publications!

2023 was a busy and successful year for our scientific community. As 2023 comes to an end, it is time to look back at some of theimpactful publications from this year.

23.08.01

Peering into Tomorrow: The Predictive Power of Machine Learning in Single Cell Analysis

Single cell analysis technologies are one of the most revolutionary advancements in recent years. However, volume and complexity of the generated data pose a significant challenge. This is where machine learning, deep learning and artificial intelligence have emergedpowerful tools

23.05.22

Standard differential gene expression analysis. What are we missing?

Single cell differential gene expression (DGE) analysis seeks to classify two (or more) gene distributions as different, where our distributions are gene expression counts from distinct populations of cells. However, we can ask the question: are all differences between distributions equivalent