Decoding the Biological Meaning of Your Data: The Power of Accurate Automated Cell Type Annotation

22.11.2023

3’

One of the crucial steps in analyzing single cell RNA sequencing (scRNA-Seq) data is to perform cell type annotation. This involves determining the identity of each cell based on its transcriptomic profile. During the analysis, cell type annotation happens after dimensionality reduction and clustering. Performing cell type annotation helps distinguish different cell populations within a sample and provides valuable biological insight to improve data interpretation. There are two main approaches for cell type annotation: manual and automated.

Manual cell type annotation

Researchers manually annotate cell types using well-known and established biomarkers obtained from literature or databases. They annotate cell types by visualizing and investigating the expression of these markers at the cluster/cell level to perform cell type annotation. However, this process requires biological knowledge and understanding, is very subjective, and prone to variation. In addition, it is time consuming. For a typical single cell dataset, it can take 20 to 40 hours to manually annotate 30 clusters.

Automated cell type annotation

To automate the process of cell type annotation, researchers have developed several computational tools following different principles, such as using marker genes and reference datasets (1). Below are a few examples.

1. Marker gene database-based:

In this approach, a curated list of marker genes collated from several studies / cell atlases or databases is used. CelliD, a tool based on multiple correspondence analysis, uses marker genes to perform cell type annotation at the single cell level (2). CelliD is available on our SynEcoSys platform as part of the downstream analysis. Other tools, such as scCATCH and SCSA, use a scoring system based on marker expression and preform annotation at the cluster level.

2. Correlation-based:

In this approach, the reference are bulk reference datasets or labelled scRNA-seq. Tools such as SingleR and scmap-cell measure the similarity between the reference and query dataset using different correlation methods such as Spearman or Cosine distance methods. The labels of the cells in the reference dataset with high similarity/correlation are assigned to the cells in the query dataset.

3. Supervised classification based:

Using machine learning algorithms, classifiers/models are trained using the labelled (annotated) reference scRNA-seq datasets. The trained classifiers or models are then applied on query datasets to predict the cell types. MapCell is such an example. It is a supervised cell type annotation tool built based on a Siamese neural network and few-shot training approach and can predict the cell type at the single cell level (3). Another tool, CellTypist, is based on logistic regression optimised by the stochastic gradient descent algorithm. It contains models trained on different tissues to perform automatic annotation (4).

Using automated tools, annotation becomes possible in a relatively short time, provides consistent results and increases reproducibility. Automated cell type annotation has become an indispensable component of single cell data analysis pipeline. However, the robustness of cell type annotation does rely on the gene markers and reference datasets used and requires careful validation and refinement.

Want to learn more about automated cell type annotation? Check out our upcoming Bioinformatics Bootcamp! Contact info@singleron.bio to learn more.

References

1. Pasquini, Giovanni, et al. “Automated methods for cell type annotation on scRNA-seq data.” Computational and Structural Biotechnology Journal19 (2021): 961-969.

2. Cortal, Akira, et al. “Gene signature extraction and cell identity recognition at the single-cell level with Cell-ID.” Nature biotechnology9 (2021): 1095-1102.

3. Koh, Winston, and Shawn Hoon. “MapCell: learning a comparative cell type distance metric with siamese neural nets with applications toward cell-type identification across experimental datasets.” Frontiers in Cell and Developmental Biology9 (2021): 767897.

4. Xu, Chuan, et al. “Automatic cell type harmonization and integration across Human Cell Atlas datasets.” bioRxiv(2023): 2023-05.

A post by Prabhakaran Munusamy

Prabhakaran Munusamy is a Bioinformatics Scientist at Singleron with more than 10 years of experience in genomics and transcriptomics data analysis. Holding a Master's degree in Bioinformatics from McGill University, Canada, he specializes in leveraging high-performance and cloud computing resources to develop robust bioinformatics workflows. With a strong publication record, Prabha is recognized for his analytical mindset, technical expertise, and clear scientific communication. In his free time, Prabha enjoys playing badminton and is among the top players on the Singleron team.

Check out our latest blog posts

Learn more

26.04.28

Decoding the Biological Meaning of Your Data: The Power of Accurate Automated Cell Type Annotation

Check out our latest blog posts

Building AI Virtual Cell Models for Drug Discovery: A Case for Clinical Data

AI Virtual Cell Data Generation: What Data Do These Models Actually Need?

AI Virtual Cell Model (AIVC)