https://www.biorxiv.org/content/10.1101/2025.07.21.665956v1.abstract
This repository contains implementation code for DeepPathway.
- DeepPathway is a bimodal contrastive learning framework that is trained on Spatial Transcriptomics (ST) datasets to predict pathway expression from H&E images.
- Ucell is used to compute pathway expression for the MSigDB hallmark pathway definitions.
- Once model is trained, unlike traditional contrasstive learning methods, DeepPathway can be used to directly predict pathway expression of test H&E image without requiring training data (or embeddings).
Create a conda envoirnment having Python >=3.10.
- python: 3.10.14
- torch: 2.7.0+cu118
- timm: 1.0.12
- spatialdata: 0.4.0
- scanpy: 1.11.5
- numpy: 1.26.4
- pandas: 2.2.3
- scikit-learn:: 1.5.2
- pillow: 11.0.0
- scipy: 1.12.0
- opencv: 4.12.0
- skimage: 0.24.0
- tiatoolbox: 1.6.0
- openslide: 1.4.1
- matplotlib: 3.9.3
- Download ST data from HEST-1k (https://huggingface.co/datasets/MahmoodLab/hest). Aslo download Metadata from HEST1-k to obtain H&E image resolutions (in MPP) or use OPenslide to get resolution values.
- Obtain Login key from H-OPtimus (https://huggingface.co/bioptimus).
- Save WSIs and ST data (.h5ad) in ./WSIs and ./st folders in the root_path='YOUR ROOT DIR'.
- Open and modify config.py and put your SAMPLE_IDs in "all_samples" list.
- Provide your Pathway Definition file. We obtained MsigDB hallmark pathway definitions from here: https://maayanlab.cloud/Enrichr/#libraries.
- Add MPP resolution of each WSI in config.py or extract from metadata (from HEST-1k in our case.)
- Restart Kernel after saving the config.py file. Run "python data_processing.py" OR see Tutorials/data_processing.ipynb for data processing
- "IMPORTANT": Run Ucell Calculations before creating SpatialData objects (which will be used in model training and Validation.). Use Ucell_code.R file with your configurations to store Spot X pathway matrix of each sample. Use R_max threshold as obtained for pathway expression quantification (which will be calculated during running data processing module.)
- Set the test and train sample_ids. In default settings, setting up test sample id will create a list of training set ids.
- Set the parameter "spot_embedding" as per your number of pathways or number of genes (No. of outputs).
- Choose your model, e.g., BLEEPOnly, BLEEPWithOptimus, or DeepPathway.
- Restart Kernel after saving the config.py and run "train.py" file.
- For predictions, use the test sample id with saved model weights. An example of obtaining predictions is provided in Tutorials/training.ipynb.
- Optionally, you can use scanpy for Spatial visualization. Current code supports integration of SpatialData for predictions.