Histology Feature Extraction Pipeline

A Nextflow pipeline for extracting features from histology whole slide images (WSI) using multiple patch and slide encoders via TRIDENT.

Overview

This pipeline performs the following steps:

Segmentation: Tissue segmentation of whole slide images
Coordinate Extraction: Extraction of patch coordinates based on configuration parameters
Patch Feature Extraction: Extraction of patch-level features using various patch encoders
Slide Feature Extraction: Aggregation of patch features to slide-level representations using slide encoders

The pipeline is optimized to avoid redundant computation by:

Running segmentation and coordinate extraction only once per unique configuration of (patch_size, mag, batch_size, overlap)
Reusing these results for all encoder combinations that share the same configuration

Requirements

Nextflow (>= 25.04.7)
Python 3.10+ (for PRISM encoder support)
TRIDENT repository cloned and configured
CHIEF repository (if using CHIEF slide encoder)
Access to required model weights and checkpoints
Hugging Face account with access token (for downloading model weights)

Installation

Clone this repository:

git clone https://github.com/digenoma-lab/HistologyFeatureExtraction.git
cd HistologyFeatureExtraction

Install Python dependencies:

pip install -r requirements.txt

Set up TRIDENT:
- Clone the TRIDENT repository
- Configure model paths in trident/slide_encoder_models/local_ckpts.json
- See TRIDENT documentation for details
Set up CHIEF (if using CHIEF encoder):
- Clone the CHIEF repository
- Download required weights
- Update params/params.yml with the CHIEF directory path

Configuration

Parameters File

Edit params/params.yml to configure:

dataset: Path to CSV file with list of WSIs to process
feature_extractors: Path to CSV file with encoder configurations
wsi_dir: Directory containing WSI files
outdir: Output directory for results
trident_dir: Path to TRIDENT repository
chief_dir: Path to CHIEF repository (if using)
token: Hugging Face access token for downloading model weights (required)

Feature Extractors Configuration

The params/feature_extractors.csv file defines which encoders to use. Format:

patch_encoder,slide_encoder,patch_size,mag,batch_size,overlap
uni_v1,mean-uni_v1,256,20,200,0
ctranspath,chief,256,20,200,0

Dataset Configuration

The params/custom_wsis.csv file lists WSIs to process. Format:

case_id,wsi
TCGA-3C-AAAU,TCGA-3C-AAAU-01A-01-TS1.2F52DD63-7476-4E85-B7C6-E06092DB6CC1.svs

Usage

Basic Usage

nextflow run main.nf -profile kutral -params-file params/params.yml --token <HUGGINGFACE_TOKEN>

Note: The --token parameter is required. You can obtain a Hugging Face access token from https://huggingface.co/settings/tokens. Make sure your token has access to the required model repositories.

Profiles

kutral: For SLURM cluster execution (ngen-ko queue)
local: For local execution

Output

Results are organized in the output directory:

results/segmentation/: Segmentation results (contours, thumbnails)
results/coordinates/: Extracted patch coordinates
results/patch_features/: Patch-level feature files
results/slide_features/: Slide-level feature files and all intermediate outputs

Supported Encoders

Patch Encoders

uni_v1, uni_v2
phikon_v2
resnet50
virchow, virchow2
conch_v15
ctranspath

Slide Encoders

mean-* (mean pooling over patch features)
titan
chief
prism

See TRIDENT documentation for full list and requirements.

Pipeline Structure

main.nf
├── segmentation (runs once per dataset)
├── preprocessing
│   ├── extract_coordinates (runs once per unique config)
└── feature_extraction
    ├── patch_features (runs for each encoder combination)
    └── slide_features (runs for each encoder combination)

Citation

If you use this pipeline, please cite:

License

See LICENSE file for details.

Author

Gabriel Cabas - DiGenoma Lab

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
imgs		imgs
modules		modules
params		params
workflows		workflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.nf		main.nf
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Histology Feature Extraction Pipeline

Overview

Requirements

Installation

Configuration

Parameters File

Feature Extractors Configuration

Dataset Configuration

Usage

Basic Usage

Profiles

Output

Supported Encoders

Patch Encoders

Slide Encoders

Pipeline Structure

Citation

License

Author

About

Uh oh!

Releases 2

Packages

Uh oh!

Languages

License

digenoma-lab/HistologyFeatureExtraction

Folders and files

Latest commit

History

Repository files navigation

Histology Feature Extraction Pipeline

Overview

Requirements

Installation

Configuration

Parameters File

Feature Extractors Configuration

Dataset Configuration

Usage

Basic Usage

Profiles

Output

Supported Encoders

Patch Encoders

Slide Encoders

Pipeline Structure

Citation

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Languages

Packages