WEHI-SODA-Hub/sp_segment is a pipeline for running cell segmentation on COMET and MIBI data. For COMET, background subtraction can be performed followed by patched cellpose segmentation, non-patched mesmer segmentation, or CellSAM foundation model segmentation. For MIBI, mesmer or CellSAM segmentation can be run. Whole-cell and nuclear segmentations are run separately, and then consolidated into whole cells with nuclei with full shape and intensity measurements per compartment. The output GeoJSON files can be viewed in QuPath.
Click to view Mermaid diagram
```mermaid flowchart TD A("COMET TIFF") --> B["Extract markers"] B --> C["Background subtraction"] C --> D{"Segmentation method"} & O["Backsub TIFF"] N("COMET/MIBI TIFF") --> D D -- Cellpose (COMET only) --> S["Combine channels"] S --> E["sopa convert"] E --> F["sopa patchify"] F --> G["cellpose (nuclear)"] F --> H["cellpose (whole-cell)"] G --> I["sopa resolve"] H --> I I --> J["parquet to tiff"] J --> K["Cell measurement"] D -- Mesmer (COMET/MIBI) --> L["mesmer (nuclear)"] D -- Mesmer (COMET/MIBI) --> M["mesmer (whole-cell)"] L --> K M --> K K --> P("GeoJSON") K --> Q["segmentation report"] Q --> R("html file") ```The pipeline uses the following tools:
- Background_subtraction -- background subtraction tool for COMET.
- MesmerSegmentation -- a CLI for running Mesmer segmentation of MIBI and OME-XML TIFFs.
- CellSAM -- a foundation model for cell segmentation across diverse imaging modalities.
- cellmeasurement -- a Groovy app that matches whole-cell segmentations with nuclei, and uses the QuPath API to calculate compartment measurements and intensities.
- KRONOS -- a foundation model for multiplex spatial proteomics that extracts rich embeddings for each cell.
- sopa -- we use the sopa CLI tool to patchify images and perform cellpose segmentation.
- spatialVis -- R package for spatial analyses, used to generate plots for the segmentation report.
Please see the docs for more detailed information on pipeline usage and output
Note
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test (to test cellpose segmentation) or -profile test_mesmer to test mesmer segmentation before running the workflow on actual data.
If you are running this pipeline from WEHI, it has been set up to run on Seqera Platform.
Note
If you don't have a .gradle directory in your home, make sure you create it with mkdir $HOME/.gradle before running the pipeline. You don't need to do this if you are running via WEHI's Seqera Platform mentioned above.
Usage will depend on your desired steps. See usage docs for more detailed information.
Note
This step will only work with COMET OME-TIF files.
Prepare a sample sheet as follows:
samplesheet.csv:
sample,run_backsub,run_mesmer,run_cellpose,run_cellsam,tiff
sample1,true,true,false,false,/path/to/sample1.tiff
sample2,true,false,false,true,/path/to/sample2.tiffYou may also prefer to use YAML for your samplesheet, either is supported:
samplesheet.yml:
- sample: sample1
run_backsub: true
run_mesmer: true
run_cellpose: false
run_cellsam: false
tiff: /path/to/sample1.tiff
- sample: sample2
run_backsub: true
run_mesmer: true
run_cellpose: false
run_cellsam: false
tiff: /path/to/sample2.tiffWarning
Please ensure that your image name and all directories in your path do not contain spaces.
If you don't specify any segmentation algorithm to run (mesmer, cellpose, or cellsam), the pipeline will run a background subtraction only.
Now, you can run the pipeline using:
nextflow run WEHI-SODA-Hub/sp_segment \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--outdir <OUTDIR>Warning
Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.
Before running Mesmer, ensure that you have a deepcell access token and that you have set it in your Nextflow secrets:
nextflow secrets set DEEPCELL_ACCESS_TOKEN $YOUR_TOKENIf you want to run Mesmer as your segmentation algorithm, you can specify a config file like so:
sample,run_backsub,run_mesmer,tiff,nuclear_channel,membrane_channels
sample1,true,true,/path/to/sample1.tiff,DAPI,CD45:CD8
sample2,false,true,/path/to/sample2.tiff,DAPI,CD45Nuclear channels only support one entry; membrane channels may have multiple
values separated by : characters. If your channels have spaces in them, make
sure that you surround your channel name with quotes. For example, CD45:"HLA I".
You can also set the segmentation parameters for mesmer either via CLI
(e.g., --combine_method prod or in a config file pass to the workflow
via -c. See usage for a full list.
Note
You cannot run multiple segmentation methods (Mesmer, Cellpose, or CellSAM) on the same sample (with the same name). If you want to run multiple methods on a sample, put it on a different line and give it a different sample name.
If you want to run Cellpose as your segmentation algorithm, you can specify a config file like so:
sample,run_backsub,run_cellpose,tiff,nuclear_channel,membrane_channels
sample1,true,true,/path/to/sample1.tiff,DAPI,CD45:CD8
sample2,false,true,/path/to/sample2.tiff,DAPI,CD45As with Mesmer, nuclear channels only support one entry; membrane channels may
have multiple values separated by : characters. You can also set the following
parameters, either via CLI (e.g., --combine_method prod or in a config
file pass to the workflow via -c. See usage for a full list.
Cellpose will run in a parallelised patched workflow using sopa. To control the
patching process, you can use the patch_width_pixel and patch_overlap_pixel
parameters.
If you want to skip measurements (this may take some time for large images), you
can use set the parameter skip_measurements to true.
KRONOS is a foundation model for multiplex spatial proteomics that extracts rich, 384-dimensional embeddings for each cell. These embeddings capture cellular phenotype and microenvironment context, enabling downstream analysis like clustering, classification, and spatial analysis.
To enable KRONOS embeddings:
nextflow run main.nf \
--input samplesheet.csv \
--skip_kronos false \
--kronos_model_path /path/to/kronos_model \
--kronos_marker_metadata /path/to/marker_metadata.csv \
--kronos_merge_geojson true \
...--skip_kronos(default: true): Set tofalseto enable KRONOS embedding extraction--kronos_model_path(required): Path to the KRONOS model checkpoint (.pt file)--kronos_marker_metadata(required): Path to marker metadata CSV file mapping marker IDs to names--kronos_merge_geojson(default: false): Merge embeddings into the cellmeasurement GeoJSON output--kronos_patch_size(default: 64): Patch size for cell-centered crops--kronos_batch_size(default: 32): Batch size for model inference--kronos_num_workers(default: 4): Number of DataLoader workers for parallel data loading--kronos_max_value(default: 65535): Maximum intensity value for normalization--kronos_marker_mapping(optional): JSON string mapping image marker names to KRONOS marker names
When --kronos_merge_geojson is enabled, the pipeline automatically creates a new segmentation mask directly from the GeoJSON polygons. This ensures 100% perfect matching between KRONOS embeddings and cells in the GeoJSON output, eliminating missing embeddings that would otherwise occur due to cell filtering in upstream segmentation/measurement steps.
This approach guarantees that:
- Every cell in the GeoJSON gets a KRONOS embedding
- No embeddings are wasted on filtered-out cells
- The merged GeoJSON contains complete data for all cells
Requirements: The GeoJSON mask generation requires shapely and rasterio Python packages, which are included in the KRONOS environment.
KRONOS produces the following outputs:
*_kronos_embeddings.csv: CSV file with cell IDs, centroids, and 384 embedding dimensions*_marker_report.txt: Report showing which image channels were matched to KRONOS markers*_kronos_merged.geojson(if--kronos_merge_geojson=true): GeoJSON file with embeddings added as cell properties
The merged GeoJSON file contains all original cell measurements plus 384 additional properties (kronos_emb_0 through kronos_emb_383), enabling integrated analysis of morphology, intensity, and KRONOS embeddings.
KRONOS expects specific marker names based on its training data. The pipeline automatically performs case-insensitive matching between your image channel names and the KRONOS marker metadata. For markers that don't auto-match, use --kronos_marker_mapping:
--kronos_marker_mapping '{"CD3e": "CD3E", "PanCK": "PANCK"}'For COMET data with fluorophore suffixes in channel names, you can map them like this:
--kronos_marker_mapping '{"DAPI": "DAPI", "FOXP3_T - TRITC": "FOXP3", "CD3_T - Cy5": "CD3"}'KRONOS automatically uses GPU acceleration when available. The pipeline is configured to request 1 GPU per KRONOS job. If no GPU is available, it falls back to CPU (which is significantly slower).
For more information about KRONOS, see the KRONOS GitHub repository.
CellSAM is a foundation model for cell segmentation that works across different imaging modalities. To use CellSAM as your segmentation algorithm, specify a config file like so:
sample,run_backsub,run_cellsam,tiff,nuclear_channel,membrane_channels
sample1,true,true,/path/to/sample1.tiff,DAPI,CD45:CD8
sample2,false,true,/path/to/sample2.tiff,DAPI,CD45Nuclear channels only support one entry; membrane channels may have multiple
values separated by : characters. If your channels have spaces in them, make
sure that you surround your channel name with quotes.
CellSAM uses a tiling approach for large images and supports the following parameters:
--cellsam_bbox_threshold(default: 0.4): Confidence threshold for cell detection--cellsam_block_size(default: 1024): Size of tiles for processing--cellsam_overlap(default: 56): Tile overlap for merging--cellsam_iou_threshold(default: 0.5): IOU threshold for label merging--cellsam_use_wsi(default: true): Enable tiling for large images
CellSAM can automatically download the latest model weights (v1.2) from users.deepcell.org. To use the latest weights:
- Create an account at users.deepcell.org
- Generate your access token
- Set it as a Nextflow secret:
nextflow secrets set DEEPCELL_ACCESS_TOKEN $YOUR_TOKEN
If the token is not set, CellSAM will use the default bundled model weights.
Note
You cannot run both Mesmer/Cellpose and CellSAM segmentation on the same sample (with the same name). If you want to run multiple methods on a sample, put it on a different line and give it a different sample name.
You can run the pipeline with different profiles for different size images:
small: for images <150GBmedium: for images <300GBlarge: for images <600GB
Warning
If you are combining many membrane channels, using prod as the combine method
may lead to large memory usage. In these cases, it is recommended to use max
instead.
WEHI-SODA-Hub/sp_segment was originally written by the WEHI SODA-Hub.
We thank the following people for their extensive assistance in the development of this pipeline:
- Michael McKay (@mikemcka)
- Emma Watson
If you would like to contribute to this pipeline, please see the contributing guidelines.
If you use WEHI-SODA-Hub/sp_segment for your analysis, please cite it using the following doi: 10.5281/zenodo.17103183
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
This pipeline was created using the nf-core template. You can cite the nf-core publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
