Annotatability

Annotatability is a method that identifies meaningful patterns in single-cell genomics data through annotation-trainability analysis; It estimates congruence in annotation structure by analyzing the training dynamics of deep neural networks.
Annotatability can be used to audit and rectify erroneous cell annotations, identify intermediate cell states, delineate complex temporal trajectories along development, characterize cell diversity in diseased tissue, identify disease-related genes, assess treatment effectiveness, and identify rare healthy-like cell populations.

Manuscript

Karin, J., Mintz, R., Raveh, B. et al. Interpreting single-cell and spatial omics data using deep neural network training dynamics. Nat Comput Sci (2024). https://doi.org/10.1038/s43588-024-00721-5

Reproducibility

For reproducibility of the results presented in the Annotability manuscript, please refer to:
https://github.com/nitzanlab/Annotatability_notebooks

Installation

pip install Annotatability

How to use

Examples of the usage of our method are available in the reproducibility repository ( https://github.com/nitzanlab/Annotatability_notebooks ) and for two additional manuscript which are not included in the manuscript: tutorial1 (runtime of a few minutes with GPU, ~20 minutes without GPU)- finding erroneous annotations and intermediate cell states in retina bipolar cells, or tutorial2 - analysis of case-control dataset of COVID-19.
Our code is based on Scanpy package, but it can be easily adapted for any Python-based single-cell data analysis. Annotatability comprises two code files:
models.py, which encompasses the training of neural network functions and the generation of the trainability-aware graph.
metrics.py, which contains the scoring functions.
Imports:

from Annotatability import metrics, models
import numpy as np
import scanpy as sc
from torch.utils.data import TensorDataset, DataLoader , WeightedRandomSampler
import torch
import torch.optim as optim
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

Train the neural network and calculate the confidence and variability metrics
We take as input annotated data (of type Anndata) named “adata”, and the annotation “label” (stores as observation) we aim to analyze.
To estimate the confidence and variability of the annotation of each cell, we use the following commands:

epoch_num=50 %Can be changed
prob_list = models.follow_training_dyn_neural_net(adata, label_key='label',iterNum=epoch_num, device=device)
all_conf , all_var = models.probability_list_to_confidence_and_var(prob_list, n_obs= adata.n_obs, epoch_num=epoch_num)
adata.obs["var"] = all_var.detach().numpy()
adata.obs["conf"] = all_conf.detach().numpy()

For 'follow_training_dyn_neural_net' function, we can change the following hyperparameters-

    iterNum : int, optional (default=100)
        Number of training iterations (epochs).

    lr : float, optional (default=0.001)
        Learning rate for the optimizer.

    momentum : float, optional (default=0.9)
        Momentum for the optimizer.

    device : str, optional (default='cpu')
        Device for training the neural network ('cpu' or 'cuda' for GPU).

    weighted_sampler : bool, optional (default=True)
        Whether to use a weighted sampler for class imbalance.

    batch_size : int, optional (default=256)
        Batch size for training.
    num_layers : int, optional (default=3)
        Depth of the neural network. Values alowed=3/4/5

Compute the annotation-trainability score

adata_ranked = metrics.rank_genes_conf_min_counts(adata)

The results will be stored as variables in:

adata_ranked.var['conf_score_high'] %annotation-trainability positive association score
adata_ranked.var['conf_score_low'] %annotation-trainability negative association score

Trainability-aware graph embedding

import scipy.sparse as sp
connectivities_graph , distance_graph  = metrics.make_conf_graph(adata.copy(), alpha=0.9 , k=15)
adata.obsp['connectivities']=sp.csr_matrix(connectivities_graph)

Note: 'alpha' can be adjusted.
To visualize of the trainability-aware graph, use the following functions:

sc.tl.umap(adata)
sc.pl.umap(adata, color='conf')

Notice that using sc.pp.neighbors(adata) will store the neighbors graph in adata.obsp['connectivities'] instead of the trainability-aware graph.

System requirements

python (>3.0)

packages:

"numpy",
"scanpy",
"numba",
"pandas",
"scipy",
"matplotlib",
"pytest",
"torch"

Running the tests

conda create -n Annotatability python=3.10
conda activate Annotatability
pip install .
pytest tests/test

Contact

Jonathan Karin - jonathan.karin [at ] mail.huji.ac.il

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github/workflows		.github/workflows
Annotatability		Annotatability
LICENSE		LICENSE
README.md		README.md
fig1.png		fig1.png
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Annotatability

Manuscript

Reproducibility

For reproducibility of the results presented in the Annotability manuscript, please refer to:
https://github.com/nitzanlab/Annotatability_notebooks

Installation

How to use

System requirements

Running the tests

Contact

About

Uh oh!

Releases 3

Packages

Uh oh!

Languages

License

nitzanlab/Annotatability

Folders and files

Latest commit

History

Repository files navigation

Annotatability

Manuscript

Reproducibility

For reproducibility of the results presented in the Annotability manuscript, please refer to: https://github.com/nitzanlab/Annotatability_notebooks

Installation

How to use

System requirements

Running the tests

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Languages

For reproducibility of the results presented in the Annotability manuscript, please refer to:
https://github.com/nitzanlab/Annotatability_notebooks

Packages