Skip to content
/ gedi2py Public

A high-performance Python package for single-cell genomics

License

Notifications You must be signed in to change notification settings

csglab/gedi2py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gedi2py

Documentation PyPI version Python License: MIT

Gene Expression Decomposition and Integration

A scverse-compliant Python package for single-cell RNA-seq batch correction and dimensionality reduction using the GEDI algorithm.

Original implementation of gedi2 as an R library is available at: https://github.com/csglab/gedi2

Overview

gedi2py implements a latent variable model for integrating single-cell RNA sequencing data across multiple samples and batches. It learns shared gene expression patterns while correcting for technical batch effects, producing batch-corrected cell embeddings suitable for downstream analysis.

Installation

pip (recommended)

pip install gedi2py

From source

git clone https://github.com/csglab/gedi2py.git
cd gedi2py
pip install -e .

Requirements

  • Python >= 3.10
  • C++14 compiler
  • Eigen3 >= 3.3.0
  • CMake >= 3.15

See the Installation Guide for detailed instructions.

Quick Start

import gedi2py as gd
import scanpy as sc

# Load data
adata = sc.read_h5ad("data.h5ad")

# Preprocess
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)

# Run GEDI batch correction
gd.tl.gedi(adata, batch_key="sample", n_latent=10)

# Visualize
gd.tl.umap(adata)
gd.pl.embedding(adata, color=["sample", "cell_type"])

Features

  • Memory-efficient: C++ backend keeps large matrices in native memory
  • Fast: OpenMP parallelization for multi-threaded optimization
  • scverse-compliant: Works seamlessly with AnnData and scanpy
  • Flexible: Supports counts, log-transformed data, paired data (e.g., CITE-seq), and binary indicators
  • Comprehensive: Includes projections, embeddings, imputation, and differential analysis

Paired Data Mode (M_paired)

gedi2py supports paired count data stored in two AnnData layers, useful for:

  • CITE-seq (ADT vs RNA)
  • Dual-modality assays
  • Ratio-based analyses
# Two layers: 'm1' (numerator counts) and 'm2' (denominator counts)
# GEDI models: Yi = log((M1+1)/(M2+1))
gd.tl.gedi(
    adata,
    batch_key="sample",
    layer="m1",      # First count matrix
    layer2="m2",     # Second count matrix
    n_latent=10
)

Documentation

Full documentation is available at csglab.github.io/gedi2py:

API Overview

gedi2py follows the scanpy convention with submodules:

Module Description
gd.tl Tools: model training, projections, embeddings, imputation, differential
gd.pl Plotting: embeddings, convergence, features
gd.io I/O: H5AD, 10X formats, model persistence
import gedi2py as gd

# Tools
gd.tl.gedi(adata, batch_key="sample")
gd.tl.umap(adata)

# Plotting
gd.pl.embedding(adata, color="cell_type")
gd.pl.convergence(adata)

# I/O
adata = gd.read_h5ad("data.h5ad")
gd.io.save_model(adata, "model.h5")

License

MIT License - see LICENSE for details.

About

A high-performance Python package for single-cell genomics

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •