A Nextflow pipeline for extracting features from histology whole slide images (WSI) using multiple patch and slide encoders via TRIDENT.
This pipeline performs the following steps:
- Segmentation: Tissue segmentation of whole slide images
- Coordinate Extraction: Extraction of patch coordinates based on configuration parameters
- Patch Feature Extraction: Extraction of patch-level features using various patch encoders
- Slide Feature Extraction: Aggregation of patch features to slide-level representations using slide encoders
The pipeline is optimized to avoid redundant computation by:
- Running segmentation and coordinate extraction only once per unique configuration of
(patch_size, mag, batch_size, overlap) - Reusing these results for all encoder combinations that share the same configuration
- Nextflow (>= 25.04.7)
- Python 3.10+ (for PRISM encoder support)
- TRIDENT repository cloned and configured
- CHIEF repository (if using CHIEF slide encoder)
- Access to required model weights and checkpoints
- Hugging Face account with access token (for downloading model weights)
- Clone this repository:
git clone https://github.com/digenoma-lab/HistologyFeatureExtraction.git
cd HistologyFeatureExtraction- Install Python dependencies:
pip install -r requirements.txt-
Set up TRIDENT:
- Clone the TRIDENT repository
- Configure model paths in
trident/slide_encoder_models/local_ckpts.json - See TRIDENT documentation for details
-
Set up CHIEF (if using CHIEF encoder):
- Clone the CHIEF repository
- Download required weights
- Update
params/params.ymlwith the CHIEF directory path
Edit params/params.yml to configure:
dataset: Path to CSV file with list of WSIs to processfeature_extractors: Path to CSV file with encoder configurationswsi_dir: Directory containing WSI filesoutdir: Output directory for resultstrident_dir: Path to TRIDENT repositorychief_dir: Path to CHIEF repository (if using)token: Hugging Face access token for downloading model weights (required)
The params/feature_extractors.csv file defines which encoders to use. Format:
patch_encoder,slide_encoder,patch_size,mag,batch_size,overlap
uni_v1,mean-uni_v1,256,20,200,0
ctranspath,chief,256,20,200,0The params/custom_wsis.csv file lists WSIs to process. Format:
case_id,wsi
TCGA-3C-AAAU,TCGA-3C-AAAU-01A-01-TS1.2F52DD63-7476-4E85-B7C6-E06092DB6CC1.svsnextflow run main.nf -profile kutral -params-file params/params.yml --token <HUGGINGFACE_TOKEN>Note: The
--tokenparameter is required. You can obtain a Hugging Face access token from https://huggingface.co/settings/tokens. Make sure your token has access to the required model repositories.
kutral: For SLURM cluster execution (ngen-ko queue)local: For local execution
Results are organized in the output directory:
results/segmentation/: Segmentation results (contours, thumbnails)results/coordinates/: Extracted patch coordinatesresults/patch_features/: Patch-level feature filesresults/slide_features/: Slide-level feature files and all intermediate outputs
- uni_v1, uni_v2
- phikon_v2
- resnet50
- virchow, virchow2
- conch_v15
- ctranspath
- mean-* (mean pooling over patch features)
- titan
- chief
- prism
See TRIDENT documentation for full list and requirements.
main.nf
├── segmentation (runs once per dataset)
├── preprocessing
│ ├── extract_coordinates (runs once per unique config)
└── feature_extraction
├── patch_features (runs for each encoder combination)
└── slide_features (runs for each encoder combination)
If you use this pipeline, please cite:
See LICENSE file for details.
Gabriel Cabas - DiGenoma Lab
