SCITUNA: a novel single-cell data integration approach that combines both graph-based and anchor-based techniques. SCITUNA constructs a graph for each batch to represent intra-batch cell similarities, and a bipartite graph to capture inter-batch similarities. This transforms the integration problem into a many-to-one matching problem, where cells from a query batch are matched with cells from a reference batch. The resulting matches are then used to transform the query cell space to the reference cell space.
- SCITUNA operates directly in the original gene expression space.
- The method introduces a novel batch ordering strategy based on optimal transport cost.
#For more information, please refer to our article: https://doi.org/10.1186/s12859-025-06087-3
The five main stages of the SCITUNA workflow: a) preprocessing and normalization, b) dimensionality reduction and clustering, c) construction of intra-graphs and the inter-graph, d) anchor selection, e) integration, and f) visualization of the integration results.
Below are the steps to obtain the results in the paper.
To download the employed datasets, follow these steps:
-
Navigate to the
datadirectory:cd data -
Run the script to download the dataset. The
datasetargument can be eitherpancreas,lung,small_atac_peaksorsmall_atac_windows:python get_data.py [dataset]
Example usage:
python get_data.py human_pancreasTo integrate multiple batches using SCITUNA, run the following command:
python multi_batch_integration.py --i [input_dataset] --b [batch_id] --c [num_cores]Arguments
--i (input_dataset): The dataset file located in "data/" (supported formats: H5AD).
--b (batch_id): The column name in ".obs" that indicates batch labels for integration.
--c (num_cores): Number of CPU cores to use for parallel processing.
To perform pairwise batch integration using SCITUNA, run the following command:
python pairwise_integration.py --i [input_dataset] --b [batch_id] --c [num_cores]These steps will help you set up the SCITUNA environment and install the necessary dependencies in both Python and R.
To ensure reproducibility and avoid conflicts with other packages, it is recommended to use a separate Conda environment for SCITUNA.
conda create -n SCITUNA python=3.10
conda activate SCITUNAThis creates and activates a new environment named SCITUNA with Python 3.10. You can choose a different version if needed, but compatibility with required packages is tested for Python 3.10.
Once the environment is active, install the required Python packages listed in requirements.txt using pip.
Make sure requirements.txt is in your current directory.
pip install -r requirements.txtSCITUNA also leverages functionality from the R package Seurat, which is widely used for single-cell RNA-seq data analysis.
We recommend installing Seurat version 3, as SCITUNA was developed and tested using this version to ensure compatibility and reproducibility.
Use Micromamba, a lightweight and super-fast binary version of Mamba (alternative to conda):
curl -L https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj bin/micromambaMove it to somewhere in your PATH
mv bin/micromamba ~/bin/micromamba # or /usr/local/bin/micromambaAdd it to PATH (if not already)
export PATH="$HOME/bin:$PATH"install Rpy2, Seurat and its dependencies
micromamba install -c conda-forge \
rpy2 \
r-seurat=3.2.3 \
r-sctransform=0.3.2 \
r-cowplot r-fitdistrplus r-ggplot2 r-ggrepel r-ggridges \
r-httr r-igraph r-irlba r-leiden r-mass r-matrix r-miniui \
r-patchwork r-plotly r-png r-rann r-rcpp r-rcppannoy \
r-reticulate r-rsvd r-rtsne r-scattermore r-shiny r-uwot \
r-rcppeigen r-spatstat=1.64_1We provide t-SNE and UMAP plots for a deeper analysis of the results. You can access them through this Google Drive link.
Houdjedj, A., Marouf, Y., Myradov, M. et al. SCITUNA: single-cell data integration tool using network alignment. BMC Bioinformatics 26, 92 (2025). https://doi.org/10.1186/s12859-025-06087-3