Add multitask attention-MIL training with K-fold cross-validation, binary evaluation helpers, and survival risk stratification by Copilot · Pull Request #4 · drgmo/STAMP

Copilot · 2026-02-08T11:47:01Z

Adds three major capabilities: (1) a multitask regression pipeline with optional K-fold CV, (2) helpers to evaluate regression models as binary classifiers without retraining, and (3) survival risk scoring and stratification via Cox PH.

Multitask training (`src/stamp/modeling/multitask.py`)

LitAttnMILMultiTask — shared AttnMIL backbone → N independent regression heads, configurable MSE/Huber loss with per-head weighting
MultitaskDataset — patient-level dataset accepting arbitrary number of targets via target_labels: dict[str, int]
train_multitask_() / crossval_multitask_() — single split or GroupKFold by patient ID with deterministic per-fold seeds
Per-fold patient-preds.csv with {target}_true / {target}_pred columns
Slide table support via _build_patient_feature_map() — resolves feature files by FILENAME→PATIENT mapping when files are in subdirectories or named differently from patient IDs
CLI: stamp --config <yaml> train_multitask

multitask_training:
  target_labels: { scarHRD: 1, TMB: 1, CLOVAR_D: 1 }
  slide_table: "/path/to/slide.csv"  # optional
  crossval:
    enabled: true
    n_splits: 5

Binary evaluation from regression (`src/stamp/statistics/binary_from_regression.py`)

Wraps existing inference output to evaluate as a binary classifier — no retraining needed.

binarize_labels() — handles "pos"/"neg", True/False, 0/1 → {0, 1}
extract_score() — scalar, dict, tuple, 2-D array model outputs
postprocess_score() — identity / sigmoid / callable, clip to [0,1]
aggregate_patient_scores() — multi-image → patient-level
get_thresholds() — Youden, mean, median, fixed, quantile strategies
evaluate_thresholds() → (summary_df, preds_df) with accuracy, F1, AUROC, sensitivity, specificity

Survival risk stratification (`src/stamp/statistics/survival_risk.py`)

Cox-on-top-of-score approach using lifelines (already a dependency).

fit_cox_model() — CoxPH on score + optional clinical covariates → CoxResult with risk scores and C-index
get_survival_thresholds() — median, mean, max log-rank (searches candidate cutpoints excluding <10%/>90% quantiles), fixed
assign_risk_groups() — dichotomize into high_risk / low_risk (or custom labels like HRDpos/HRDneg)
evaluate_survival_stratification() — C-index, log-rank p per threshold strategy
plot_km_by_group() — KM curves with at-risk counts and annotation

Beartype fix

LitSlideRegressor._step: batch: tuple[Tensor, Tensor] → batch: tuple[Tensor, ...] | list[Tensor] — DataLoader returns lists.

Config

MultitaskTrainingConfig / MultitaskCrossvalConfig in modeling/config.py, wired into StampConfig.multitask_training. CV disabled by default — existing runs unaffected.

Tests

10 tests for multitask (dataset, model, single train, CV, many targets, subdirs, slide table)
41 tests for binary evaluation helpers
19 tests for survival risk stratification

Original prompt

You are a senior Python/ML engineer. Your task is to create a new git branch in my STAMP repo and integrate K-fold cross-validation into the existing multitask training pipeline (stamp ... train_multitask), matching the style/behavior of the original STAMP cross-validation as closely as reasonable. Work like a careful codebase integrator: inspect first, then implement minimal, well-scoped changes, then add docs + a small test.
<repo_context>

Project: STAMP / lightspeed (Python 3.12, PyTorch Lightning)
Command: stamp --config train_multitask
Multitask config block: multitask_training: (example below)
Current behavior: single train/val split, trains AttnMILMultiTask, saves best model by val_loss.
Known issue: beartype warns because batch is annotated as tuple[torch.Tensor, torch.Tensor] but a list[...] arrives in training_step/validation_step/_step. This should be fixed cleanly in the same branch without breaking behavior.
Time context: Sunday, February 08, 2026 UTCUTC
</repo_context>

<current_config_example>
multitask_training:
output_dir: "/mnt/bulk-saturn/gurcan/ovar/multitaskregression"
clini_table: "/mnt/bulk-saturn/gurcan/ovar/TCGA_OV_bio/TCGA_OV_bio_norm_outlier.csv"
slide_table: "/mnt/bulk-saturn/gurcan/ovar/TCGA_OV_bio/TCGA_OV_bio_slide.csv"
feature_dir: "/mnt/bulk-saturn/gurcan/ovar/TCGA_OV_bio/features/uni2/uni2-a68012e1"
patient_label: "PATIENT"
filename_label: "FILENAME"
target_labels:
scarHRD: 1
TMB: 1
CLOVAR_D: 1
CLOVAR_I: 1
CLOVAR_M: 1
CLOVAR_P: 1
loss_weights:
scarHRD: 1.0
TMB: 0.1
CLOVAR_D: 0.3
CLOVAR_I: 0.3
CLOVAR_M: 0.3
CLOVAR_P: 0.3
emb_dim: 256
bag_size: 512
batch_size: 16
num_workers: 4
max_epochs: 64
patience: 16
max_lr: 1.0e-4
div_factor: 25.0
loss_type: "mse" # "huber" or "mse"
huber_delta: 1.0
accelerator: "gpu" # "gpu" or "cpu"
seed: 42
</current_config_example>
<new_config_requirements>
Add optional cross-validation keys (default disabled so existing runs behave identically):
multitask_training:
crossval:
enabled: true|false
n_splits: 5
fold_index: null # optional: run only one fold (debug/HPC)
shuffle: true
random_state: 42
stratify_target: null | "scarHRD" | "TMB" | "CLOVAR_D" | ...
stratify_bins: 5 # for regression stratification via quantile binning (optional)
Output layout (example):

output_dir/

fold_0/

model.ckpt
metrics.json (or yaml)
fold_split.json (patient ids / indices)
zscore_stats.pt (or similar)

fold_1/
...
crossval_summary.json (aggregate mean/std + metadata)
</new_config_requirements>

<implementation_guidelines>

First, locate the original STAMP cross-validation implementation in the repo (search for: crossval, kfold, GroupKFold, StratifiedKFold, split, fold, patient). Mirror its conventions (seeding, outputs, CLI patterns) where feasible.
Keep changes minimal and well-contained:

Either add a new runner entrypoint (e.g. train_multitask_crossval) OR add a crossval mode inside train_multitask gated by config.

Use deterministic seeds per fold: e.g. seed + fold_id.
Use patient-level splitting:

Prefer GroupKFold by patient id.
If stratification is enabled for regression, implement a robust approach (e.g. bin the selected target into quantiles, then use a grouped/stratified strategy). If too complex, document limitations and implement non-stratified GroupKFold first.

Logging/metrics:

Log overall val_loss and (if available) per-head losses for each fold.

Fix the beartype warning properly:

Option A: make dataset/collate return (x, y) tuples consistently.
Option B: change type hints to accept Sequence[Tensor] or tuple[Tensor, ...] | list[Tensor].
Add a small runtime assertion to ensure expected structure (length == 2, types are Tensor/dict as expected).
</implementation_guidelines>

<beartype_issue_log>
Warnings like:

LitAttnMILMultiTask._step() parameter batch expected tuple[torch.Tensor, torch.Tensor]
but got list[tensor, ...] in training_step/validation_step
Fix in this branch.
</beartype_issue_log>

<how_to_respond>

Start by listing the relevant files/modules you found (paths) and where you plan to hook cross-validation.
Provide a step-by-step implementation plan.
Then provide concrete diffs / code changes file-by-file (include new files too).
Finish with exact commands to run:

how to run single-fold debug
how to run full CV
how to verify outputs and correctness (no leakage, no patient overlap)
If something is unclear/missing in the repo, ask targeted questions, but proceed with a reasonable default design (GroupKFold by patient_label) so progress continues.
</how_to_respond>
Do not output a bibliography or list of sources at the end.

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

…batch type hint Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>

… for reproducibility Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>

Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>

…sertion Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>

…et names Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>

…ets) Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>

Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>

…k_scores param Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>

Initial plan

b5eb257

Copilot AI assigned Copilot and drgmo Feb 8, 2026

Copilot started work on behalf of drgmo February 8, 2026 11:47 View session

Copilot AI and others added 2 commits February 8, 2026 11:52

Add multitask training with K-fold cross-validation and fix beartype …

f6f71b0

…batch type hint Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>

Address code review: move shutil import to top level and use Seed.set…

089751d

… for reproducibility Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Integrate K-fold cross-validation into multitask training pipeline~~ Add multitask training pipeline with K-fold cross-validation Feb 8, 2026

Copilot AI requested a review from drgmo February 8, 2026 12:00

Copilot finished work on behalf of drgmo February 8, 2026 12:00

Copilot started work on behalf of drgmo February 8, 2026 13:28 View session

Copilot AI and others added 2 commits February 8, 2026 13:33

Add patient-preds.csv output to each fold in multitask cross-validation

38aac31

Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>

Refactor: extract shared patient filtering helper, strengthen test as…

cffa1f7

…sertion Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>

Copilot AI changed the title ~~Add multitask training pipeline with K-fold cross-validation~~ Add patient-preds.csv output to multitask cross-validation folds Feb 8, 2026

Copilot finished work on behalf of drgmo February 8, 2026 13:36

Copilot started work on behalf of drgmo February 8, 2026 13:37 View session

Make patient-preds.csv column assertions generic over configured targ…

f35aaeb

…et names Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>

Copilot AI changed the title ~~Add patient-preds.csv output to multitask cross-validation folds~~ Integrate multitask attention-MIL training with K-fold cross-validation Feb 8, 2026

Copilot finished work on behalf of drgmo February 8, 2026 13:41

Copilot started work on behalf of drgmo February 8, 2026 13:50 View session

Add test proving arbitrary number of targets works end-to-end (5 targ…

e9b1c35

…ets) Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>

Copilot AI changed the title ~~Integrate multitask attention-MIL training with K-fold cross-validation~~ Multitask attention-MIL training with K-fold cross-validation Feb 8, 2026

Copilot finished work on behalf of drgmo February 8, 2026 13:54

Copilot started work on behalf of drgmo February 8, 2026 13:58 View session

Support feature files in subdirectories of feature_dir

8ce225f

Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>

Copilot AI changed the title ~~Multitask attention-MIL training with K-fold cross-validation~~ Add multitask attention-MIL training with K-fold cross-validation Feb 8, 2026

Copilot finished work on behalf of drgmo February 8, 2026 14:03

Copilot started work on behalf of drgmo February 8, 2026 14:25 View session

Support slide_table for mapping patients to feature files in subfolders

1cf2a9a

Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>

Copilot finished work on behalf of drgmo February 8, 2026 14:32

Copilot started work on behalf of drgmo February 8, 2026 17:12 View session

Copilot AI and others added 2 commits February 8, 2026 17:18

Add binary evaluation helpers for regression-as-classifier workflow

4e447fc

Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>

Code review: simplify Youden J formula, fix spelling

3eaaa51

Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>

Copilot AI changed the title ~~Add multitask attention-MIL training with K-fold cross-validation~~ Add multitask attention-MIL training with K-fold cross-validation and binary evaluation helpers Feb 8, 2026

Copilot finished work on behalf of drgmo February 8, 2026 17:22

Copilot started work on behalf of drgmo February 8, 2026 17:29 View session

Copilot AI and others added 2 commits February 8, 2026 17:34

Add survival risk scoring and stratification helpers

dd414f8

Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>

Address review: use generic default labels, improve type hint for ris…

2deddc0

…k_scores param Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>

Copilot AI changed the title ~~Add multitask attention-MIL training with K-fold cross-validation and binary evaluation helpers~~ Add multitask attention-MIL training with K-fold cross-validation, binary evaluation helpers, and survival risk stratification Feb 8, 2026

Copilot finished work on behalf of drgmo February 8, 2026 17:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multitask attention-MIL training with K-fold cross-validation, binary evaluation helpers, and survival risk stratification#4

Add multitask attention-MIL training with K-fold cross-validation, binary evaluation helpers, and survival risk stratification#4
Copilot wants to merge 13 commits intomainfrom
copilot/integrate-k-fold-validation

Copilot AI commented Feb 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Multitask training (src/stamp/modeling/multitask.py)

Binary evaluation from regression (src/stamp/statistics/binary_from_regression.py)

Survival risk stratification (src/stamp/statistics/survival_risk.py)

Beartype fix

Config

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Feb 8, 2026 •

edited

Loading

Multitask training (`src/stamp/modeling/multitask.py`)

Binary evaluation from regression (`src/stamp/statistics/binary_from_regression.py`)

Survival risk stratification (`src/stamp/statistics/survival_risk.py`)