PIGE (Pathway-Informed Graph Explanation) is a deep learning framework that predicts anti-cancer drug response by understanding the underlying biological mechanisms. PIGE allows you to answer the following questions:
- Why is this cell line sensitive or resistant to this drug?
- Which pathways/genes/interactions drive the drug response or lack thereof?
- What happens to drug response if we block a specific pathway/gene/interaction?
Precomputed results can be explored using Interactive PIGE Atlas: https://www.pigeatlas.com
PIGE integrates multi-omics data (mutations, copy number alterations, gene expression) with drug chemical structures through a biologically informed pathway interaction network. The model architecture processes genomic features through this pathway graph, creating interpretable pathway-level representations. By systematically knocking out pathways and genes and measuring changes in model predictions, PIGE generates importance scores that identify influential biological mechanisms.
-
Improved Prediction Performance and Generalizability
- Spearman ρ=0.84 in 5-fold cross-validation on CTRPv2 training data
- Spearman ρ=0.56 on external GDSC2 validation (excluding all overlapping cell line-drug pairs)
- Outperforms DrugCell (ρ=0.52), DRPreter (ρ=0.51), and other state-of-the-art methods on external validation
-
Enhanced Interpretability
- 74% pathway hit rate at K=25 vs 50% for CRISPR differential essentiality, 60% for GSEA, and 29% for DrugCell
- Over 60% gene hit rate at K=100 (representing 4.8% of all genes) vs 41% for CRISPR screens
- Recovers specific mechanisms of action including non-obvious targets (e.g. CX3CL1 as a resistance hub to chemotherapy in triple-negative breast cancer, only discovered experimentally in 2017)
-
Enhanced Clinical Translation Potential
- Validated on BeatAML ex vivo patient samples
- Generalizes to unseen drugs and drug classes (e.g. MDM2 inhibitors not present in training data)
- Identifies patient-specific resistance mechanisms that suggest rational combination therapies
PIGE is based on a biologically informed pathway interaction network constructed from Gene Ontology (GO) biological processes and OmniPath protein-protein interactions. The architecture consists of three integrated components:
This design mirrors biology: each pathway's influence on a cell's response is determined by its interactions with upstream pathways, the cell's genomics, and the drug's properties.
If you use PIGE in your research, please cite:
[Will be added when paper is published]
- Interactive PIGE Atlas: https://www.pigeatlas.com - Explore 45,000+ pathway graphs across 70 drugs and 800+ cell lines
The project consists of the following folders:
data/- Contains all data on which models were builtinput_data/- Raw input data (omics, drug response, pathway data)intermediate_data/- Processed intermediate files (pathway graphs, precomputed features)output_data/- Model outputs and trained model checkpoints
src/- Source code for PIGEdata_processing/- Data preprocessing and feature generationmodels/- Model architectures (Structural Causal Model, GAT layers)pathway_interaction_setup/- Pathway graph construction from GO and OmniPathmodel_interpretability/- Virtual knockout analysis and interpretability toolspige_atlas/- Interactive graph visualization and atlas generationtrain/- Training scripts and experiment managementtrain_test/- External validation and evaluation scripts
notebooks/- Jupyter notebook tutorials and examples01_train_erlotinib.ipynb- Train PIGE on a single drug02_knockout_analysis.ipynb- Virtual knockout analysis03_generate_pige_graphs.ipynb- Generate interactive pathway graphs
We recommend using a virtual environment to manage dependencies:
# Create virtual environment (using conda)
conda create -n pige python=3.10
conda activate pige
# Or using venv
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate# Install PyTorch (adjust CUDA version as needed)
# For CUDA 11.0+
pip install torch==1.12.0+cu116 torchvision==0.13.0+cu116 torchaudio==0.12.0 -f https://download.pytorch.org/whl/torch_stable.html
# Install PyTorch Geometric and dependencies
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv -f https://data.pyg.org/whl/torch-1.12.0+cu116.html
pip install torch-geometric
# Install other dependencies
pip install -r requirements.txtStart with the tutorial notebooks in notebooks/:
- Train a model:
notebooks/01_train_erlotinib.ipynb(~15 minutes) - Run interpretability:
notebooks/02_knockout_analysis.ipynb(~5 minutes) - Generate graphs:
notebooks/03_generate_pige_graphs.ipynb(~5 minutes)
For programmatic usage, use the main pipeline:
from src.pipeline.pipeline_orchestrator import PipelineOrchestrator
from src.main import resolve_config_variables
import yaml
# Load and resolve config
with open('src/configs/config.yaml', 'r') as f:
config = yaml.safe_load(f)
config = resolve_config_variables(config)
# Run pipeline
orchestrator = PipelineOrchestrator(config)
orchestrator.run()Interactive tutorials demonstrating PIGE's capabilities:
| Tutorial | Description | Runtime | Link |
|---|---|---|---|
| Train PIGE | Train PIGE on a single drug (Erlotinib) using the full 5-stage pipeline | ~15 min | notebooks/01_train_erlotinib.ipynb |
| Virtual Knockout Analysis | Identify key pathways and genes driving drug sensitivity vs resistance | ~5 min | notebooks/02_knockout_analysis.ipynb |
| Generate PIGE Graphs | Create interactive visualizations of pathway crosstalk and mechanisms | ~5 min | notebooks/03_generate_pige_graphs.ipynb |
PIGE uses the following publicly available datasets:
- CTRPv2 - Cancer Therapeutics Response Portal v2 (drug response data)
- GDSC - Genomics of Drug Sensitivity in Cancer (external validation)
- DepMap - Dependency Map (omics data: mutations, CNAs, expression)
- Gene Ontology - Biological process annotations (pathway definitions)
- OmniPath - Protein-protein interaction network (pathway crosstalk)
- Python: 3.10+
- PyTorch: 1.13+ (with CUDA support recommended)
- CUDA: 11.0+ (GPU recommended, 8GB+ VRAM)
- Training and interpretability will work on CPU but will be significantly slower
- Disk Space: ~30GB for data and intermediate files (if training on all 68 drugs)
- RAM: 16GB+ needed for interpretability analysis
For questions, issues, or collaboration inquiries:
- Email: cbahl076@uottawa.ca
- Issues: Please open an issue on GitHub for bug reports or feature requests
