AI Virtual Cell Model (AIVC)

15.03.2026

11’

What if we could observe how a human cell responds to a drug, a genetic change, or an environmental shift—without performing a single wet‑lab experiment?

This is the objective of the AI Virtual Cell Model (AIVC) : a computational model designed to simulate the behavior of cells and cellular systems.

Biological research has relied on testing individual mechanisms through experiments. While this approach has generated enormous insight, as any biology PhD student will testify, it is slow, expensive, and difficult to scale.

Two major technological shifts have reshaped what is possible:

Single cell multi-omics: Single cell and multi‑omics platforms generate large, detailed datasets across tissues, diseases, and perturbations.
Artificial intelligence: Modern AI methods learn complex biological patterns directly from data without requiring manually defined pathways.

AIVC sits at the intersection of these breakthroughs. Instead of modeling one pathway or gene at a time, it learns the behavior of the entire cell as an integrated system.

The vision is ambitious. AIVCs could change how we discover drugs, understand disease, and design personalized treatments. However, biology is complex and current data are incomplete. Building a model of an entire living cell, not to mention different cell types and their interactions, remains a substantial scientific challenge.

This article outlines what AIVCs are, why they matter, and future outlooks.

What is an AI Virtual Cell Model?

The foundational concept of the AIVC is to move beyond modeling isolated cellular processes to create a comprehensive, predictive simulator of the entire cell.

Traditionally, biological research focuses on specific pathways. While these mechanistic models are powerful, they are narrow in scope. AIVC integrates multiple biological layers—gene expression, epigenetic state, protein interactions, and spatial context—to create unified models for the whole cell.

In their landmark publication, Brunne et al. (2024) define AIVC as a multi‑modal, multi‑scale model that can:

Create Universal Representations (URs). AIVC maps biological data from different individuals, conditions, and modalities into a shared computational space. This creates a universal reference for cell states across cell types and species.
Predict Cellular Behavior. The model predicts how cells respond to perturbations, whether genetic (mutations or manipulations), chemical (drugs), or environmental. It can simulate dynamic processes like differentiation, disease progression, and aging, including cell states it has not previously encountered.
Enable In Silico Experimentation. AIVCs allow researchers to perform virtual experiments that would be costly or impossible in a wet lab. This enables the computational generation and prioritization of hypotheses, ensuring that only the most promising ones are validated experimentally.

Why Do We Need AIVC?

The development of AIVCs addresses several persistent challenges in biomedical research.

Scaling Biological Discoveries. Diseases, development, and biodiversity are driven by coordinated changes across molecular layers. AIVCs provide a systems‑level view, modeling how disruptions propagate through regulatory networks to alter cell states and phenotypes, so that, as Stephen Quake puts it, “cell biology goes from being 90% experimental and 10% computational to the other way around.”
Accelerating Drug Discovery. By simulating drug–cell interactions, AIVCs can rapidly screen candidates, predict efficacy and toxicity, and identify novel therapeutic targets. This reduces the time and cost of development.
Reducing Reliance on Animal Models. Animal models are important for validating hypotheses before clinical trials but have limitations. Species differences in gene regulation and metabolism can lead to inaccurate predictions of human responses. High‑fidelity virtual representations of human cell states provide complementary validation, potentially reducing unnecessary experiments and improving translational accuracy.
Advancing Precision Medicine. The ultimate vision includes patient‑specific “virtual twins.” These digital models, built from an individual’s cellular data and medical records, could predict patient responses to treatment and inform personalized therapies.

A Brief History of the Virtual Cell Concept

The idea of a virtual cell has evolved over several decades (Elliot Hershberg provided a detailed overview):

1993 – Casini: The essay “in vivo, in vitro, in silicio: towards a virtual cell” was likely the first published mention of the concept. It imagined computational models across multiple data modalities used as a universal reference to predict the function of new sequences, diagnose disease, and test gene therapies.
1997 – Schaff et al. developed computational frameworks for modeling cellular structure and function, specifically modeling intracellular signaling in neuronal cells.
1999 – Tomita: In collaboration with Craig Venter, Tomita developed E‑Cell, software for integrated modeling of gene regulation, metabolism, and signaling. This resulted in the first virtual cell model with 127 genes.
2012 – Karr et al.: Published the first whole‑cell computational model of Mycoplasma genitalium. This model integrated multiple biological processes and successfully predicted phenotypic outcomes from genotypic changes, validated against wet‑lab gene knockout data.

These early efforts were largely mechanistic, requiring painstaking annotation of biochemical reactions. As multi‑omic data and AI methods became available, the field shifted. The emphasis moved from building detailed models of single organisms to training scalable models across many tissues and conditions.

Recent developments:

Generative foundation models such as scGPT (2024), CellFM (2025), and scLong (2026). These are trained on single cell multi‑omics data from tens to hundreds of millions of cells, demonstrating the ability to predict drug responses, perturbations, and gene functions.
Large initiatives, including the CZI Virtual Cells Platform, Google DeepMind, and the Arc Virtual Cell Atlas, are currently making major investments in data generation and model creation.

A recent Nature Genetics Review piece (Wu, 2026) looked back at the shift from the mechanistic approach to large-scale AI models, and pointed out the loss of interpretability in the modern, blackboxed versions of virtual cells. This is a challenge that this article will come back to in later sections. It would be interesting to see if the next generations of virtual cells can combine the best of both worlds.

Data Requirements for AIVC

AIVCs depend entirely on the quality of their training data. Predictive virtual cells require large, diverse, well‑annotated, and biologically coherent single cell datasets spanning tissues, diseases, and perturbations.

Single Cell Sequencing

Single cell sequencing is central to this field because it enables the consistent collection of high‑dimensional data from individual cells at scale. While it began with scRNA‑seq, it now integrates other multi‑omic dimensions.

Multi‑Omics

Transcriptomics captures only one stage of cellular function. Other essential layers include:

Chromatin accessibility (ATAC‑seq): Captures regulatory potential.
Proteomics: Reflects functional protein abundance.
Spatial transcriptomics: Preserves tissue architecture and cell–cell interactions.
Subcellular imaging: Informs morphology and compartmentalization.
Functional genomics screens: Reveal causal relationships.

Perturbation Data

While observational data describe existing states, perturbation experiments reveal causal relationships. To predict responses to new interventions, AIVCs must be trained on diverse perturbation scenarios.

How Single Cell Data Drive AIVC Model Development

Single cell multi‑omics data support AIVCs in three primary ways:

Training Foundation Models: Large datasets enable pre‑training of generalizable models that learn core biological representations.
Building Disease‑Specific Models: Focused datasets allow for specialized models of tumor microenvironments, neurodegeneration, immune dysregulation, and metabolic disorders. These require well‑curated patient samples and consistent metadata.
Testing and Validation: Independent datasets are essential for evaluating predictive accuracy on unseen conditions or patient cohorts.

Key Challenges

Several obstacles currently limit AIVC capabilities:

Data Availability: High‑quality, disease‑specific samples are difficult to obtain, particularly for rare diseases and early‑stage conditions.
Cost: Single cell multi‑omics and perturbation screens remain expensive, making it difficult to generate datasets large enough for robust training.
Data Integration: Combining data from different platforms and laboratories introduces technical variability. Harmonization methods reduce noise but cannot eliminate all biases.
Missing Dimensions: Current datasets often lack time‑resolved measurements, high‑resolution spatial information, and longitudinal patient tracking.
Model Evaluation: There is no universal benchmark for virtual cell performance. Predictive claims must be validated through prospective experimental testing.
Interpretability: Understanding how deep learning models generate predictions is difficult.
Ethics: Using patient data to inform therapeutic decisions raises issues regarding privacy, bias, and equitable representation.

The Importance of Data Infrastructure

Data quality is the most consistent requirement for AIVC development. Predictive models require datasets that are large, diverse, well‑annotated, and harmonized.

Raw data are insufficient for this task. Standardized processing, unified ontologies, and rich clinical metadata are necessary to build reliable training corpora. This has led to the development of AI‑ready cellular knowledge bases that serve as the reference layer for virtual cell modeling.

Platforms such as SynEcoSys align with this need by providing standardized single cell datasets with consistent annotations and integrated metadata. By reducing technical noise and supporting systematic exploration, these infrastructures serve as foundational resources for training, validating, and interpreting AIVC models. For researchers, access to harmonized, high‑quality data is often the deciding factor in model reliability.

Future Directions: Toward Personalized Virtual Twins

The long‑term goal of AIVC research is high‑fidelity virtual twins.

In this vision, a patient’s genomic profile, single cell transcriptome, and clinical metadata would inform a personalized model. Researchers could simulate disease progression, drug responses, and adverse event risks.

The personalized virtual twin model will also enable data-driven, personalized combinatory therapy. With the increasing availability of drug candidates, the dosage and combination quickly become infeasible to test in vitro. Clinicians could test combination therapies in silico, and define the best suited approach for individual patients.

For rare diseases that do not currently have established treatment options, virtual twins can help to predict potential drug target, and identify treatment that can help resolve individual diseases.

Early effort has already been put in clinical testing, such as Certainty, a EU-funded project to improve personalized cancer immunotherapy, and B2B-Rare, an international collaboration to identify new treatment for neuromuscular disorders.

While this goal remains ahead, these early results suggest it is not unattainable.

Frequently Asked Questions

How accurate are AIVCs today?

Performance varies. Models show promising results in specific contexts, such as predicting gene perturbation effects, but broad prediction across all cell types is still being researched.

Are AIVCs ready for clinical use?

Not yet. AIVC at the moment is still a long-term vision. Most applications are in the preclinical or research stages. Rigorous validation and regulatory evaluation are required before clinical integration.

What is the relationship between AIVCs and Large Language Models (LLMs)?

AIVC is an integrated system composed of multiple interconnected models. Different models may be used to process different data dimensions.

LLM, on the other hand, is one type of AI model. It typically employs the transformer architecture, which is a type of neural network architecture well suited for handling variable and high-dimensional data, such as sequencing data.

Other deep learning models, such as the convolutional neural network (CNN) or the diffusion model, can better suited for other data modalities such as interactions or spatial information.

Closing Thoughts

The AI Virtual Cell model reframes the cell as a system that can be computationally modeled and interrogated. Progress depends on high‑quality single cell data, robust AI architectures, and transparent validation.

As data infrastructure improves, virtual representations of cellular systems will become increasingly predictive, and will have significant impact from research to drug discovery and precision medicine and health.

References

Bai, D. et al. (2026). scLong: a billion-parameter foundation model for capturing long-range gene context in single-cell transcriptomics. Nat Commun 17, 2380 (2026). https://doi.org/10.1038/s41467-026-69102-y

Bunne, C. et al. (2024). How to build the virtual cell with artificial intelligence: Priorities and opportunities. Cell, 187(25), 7045–7063. https://doi.org/10.1016/j.cell.2024.11.015

Callaway, E. (2025). Can AI build a virtual cell? Scientists race to model life’s smallest unit. Nature, 643(8070), 13–14. https://doi.org/10.1038/d41586-025-02011-0

Casini, T. (1993). In vivo, in vitro, in silicio: towards a virtual cell. Trends Genet. 9(4):105. doi: 10.1016/0168-9525(93)90202-s.

Cui, H. et al. (2024). scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat Methods 21, 1470–1480. https://doi.org/10.1038/s41592-024-02201-0

Dibaeinia, P. et al. (2026). Virtual Cells Need Context, Not Just Scale. bioRxiv. https://doi.org/10.64898/2026.02.04.703804

Hershberg, E. (n.d.). The virtual cell. Century of Bio. https://centuryofbio.com/p/virtual-cell

Karr, J. R., et al (2012). A whole-cell computational model predicts phenotype from genotype. Cell. 150(2):389-401. https://doi.org/10.1016/j.cell.2012.05.044

Schaff, J. et al. (1997). A general computational framework for modeling cellular structure and function. Biophys J. 1997 Sep;73(3):1135-46. https://doi.org/10.1016/s0006-3495(97)78146-3

Tomita, M. (1999). E-CELL: software environment for whole-cell simulation. Bioinformatics. 15(1):72-84. https://doi.org/10.1093/bioinformatics/15.1.72

Wu, A. R. (2026). Revisiting the blueprint for an interpretable virtual cell. Nature Reviews Genetics. https://doi.org/10.1038/s41576-026-00940-8

Zeng, Y. et al. (2025). CellFM: a large-scale foundation model pre-trained on transcriptomics of 100 million human cells. Nat Commun 16, 4679. https://doi.org/10.1038/s41467-025-59926-5

A post by Yingting Wang

Check out our latest blog posts

Learn more

26.03.11

AI Virtual Cell Model (AIVC)

Check out our latest blog posts

Tissue Preservation is the Unsung Hero of a Successful Single-Cell Experiment

Full Transcriptome Insight from Single Cell Sequencing

% intron reads matter in single-cell RNA sequencing data. Why?