Skip to content

Add multitask attention-MIL training with K-fold cross-validation, binary evaluation helpers, and survival risk stratification#4

Draft
Copilot wants to merge 13 commits intomainfrom
copilot/integrate-k-fold-validation
Draft

Add multitask attention-MIL training with K-fold cross-validation, binary evaluation helpers, and survival risk stratification#4
Copilot wants to merge 13 commits intomainfrom
copilot/integrate-k-fold-validation

Conversation

Copy link

Copilot AI commented Feb 8, 2026

Adds three major capabilities: (1) a multitask regression pipeline with optional K-fold CV, (2) helpers to evaluate regression models as binary classifiers without retraining, and (3) survival risk scoring and stratification via Cox PH.

Multitask training (src/stamp/modeling/multitask.py)

  • LitAttnMILMultiTask — shared AttnMIL backbone → N independent regression heads, configurable MSE/Huber loss with per-head weighting
  • MultitaskDataset — patient-level dataset accepting arbitrary number of targets via target_labels: dict[str, int]
  • train_multitask_() / crossval_multitask_() — single split or GroupKFold by patient ID with deterministic per-fold seeds
  • Per-fold patient-preds.csv with {target}_true / {target}_pred columns
  • Slide table support via _build_patient_feature_map() — resolves feature files by FILENAME→PATIENT mapping when files are in subdirectories or named differently from patient IDs
  • CLI: stamp --config <yaml> train_multitask
multitask_training:
  target_labels: { scarHRD: 1, TMB: 1, CLOVAR_D: 1 }
  slide_table: "/path/to/slide.csv"  # optional
  crossval:
    enabled: true
    n_splits: 5

Binary evaluation from regression (src/stamp/statistics/binary_from_regression.py)

Wraps existing inference output to evaluate as a binary classifier — no retraining needed.

  • binarize_labels() — handles "pos"/"neg", True/False, 0/1{0, 1}
  • extract_score() — scalar, dict, tuple, 2-D array model outputs
  • postprocess_score() — identity / sigmoid / callable, clip to [0,1]
  • aggregate_patient_scores() — multi-image → patient-level
  • get_thresholds() — Youden, mean, median, fixed, quantile strategies
  • evaluate_thresholds()(summary_df, preds_df) with accuracy, F1, AUROC, sensitivity, specificity

Survival risk stratification (src/stamp/statistics/survival_risk.py)

Cox-on-top-of-score approach using lifelines (already a dependency).

  • fit_cox_model() — CoxPH on score + optional clinical covariates → CoxResult with risk scores and C-index
  • get_survival_thresholds() — median, mean, max log-rank (searches candidate cutpoints excluding <10%/>90% quantiles), fixed
  • assign_risk_groups() — dichotomize into high_risk / low_risk (or custom labels like HRDpos/HRDneg)
  • evaluate_survival_stratification() — C-index, log-rank p per threshold strategy
  • plot_km_by_group() — KM curves with at-risk counts and annotation

Beartype fix

LitSlideRegressor._step: batch: tuple[Tensor, Tensor]batch: tuple[Tensor, ...] | list[Tensor] — DataLoader returns lists.

Config

MultitaskTrainingConfig / MultitaskCrossvalConfig in modeling/config.py, wired into StampConfig.multitask_training. CV disabled by default — existing runs unaffected.

Tests

  • 10 tests for multitask (dataset, model, single train, CV, many targets, subdirs, slide table)
  • 41 tests for binary evaluation helpers
  • 19 tests for survival risk stratification
Original prompt

You are a senior Python/ML engineer. Your task is to create a new git branch in my STAMP repo and integrate K-fold cross-validation into the existing multitask training pipeline (stamp ... train_multitask), matching the style/behavior of the original STAMP cross-validation as closely as reasonable. Work like a careful codebase integrator: inspect first, then implement minimal, well-scoped changes, then add docs + a small test.
<repo_context>

Project: STAMP / lightspeed (Python 3.12, PyTorch Lightning)
Command: stamp --config train_multitask
Multitask config block: multitask_training: (example below)
Current behavior: single train/val split, trains AttnMILMultiTask, saves best model by val_loss.
Known issue: beartype warns because batch is annotated as tuple[torch.Tensor, torch.Tensor] but a list[...] arrives in training_step/validation_step/_step. This should be fixed cleanly in the same branch without breaking behavior.
Time context: Sunday, February 08, 2026 UTCUTC
</repo_context>

<current_config_example>
multitask_training:
output_dir: "/mnt/bulk-saturn/gurcan/ovar/multitaskregression"
clini_table: "/mnt/bulk-saturn/gurcan/ovar/TCGA_OV_bio/TCGA_OV_bio_norm_outlier.csv"
slide_table: "/mnt/bulk-saturn/gurcan/ovar/TCGA_OV_bio/TCGA_OV_bio_slide.csv"
feature_dir: "/mnt/bulk-saturn/gurcan/ovar/TCGA_OV_bio/features/uni2/uni2-a68012e1"
patient_label: "PATIENT"
filename_label: "FILENAME"
target_labels:
scarHRD: 1
TMB: 1
CLOVAR_D: 1
CLOVAR_I: 1
CLOVAR_M: 1
CLOVAR_P: 1
loss_weights:
scarHRD: 1.0
TMB: 0.1
CLOVAR_D: 0.3
CLOVAR_I: 0.3
CLOVAR_M: 0.3
CLOVAR_P: 0.3
emb_dim: 256
bag_size: 512
batch_size: 16
num_workers: 4
max_epochs: 64
patience: 16
max_lr: 1.0e-4
div_factor: 25.0
loss_type: "mse" # "huber" or "mse"
huber_delta: 1.0
accelerator: "gpu" # "gpu" or "cpu"
seed: 42
</current_config_example>
<new_config_requirements>
Add optional cross-validation keys (default disabled so existing runs behave identically):
multitask_training:
crossval:
enabled: true|false
n_splits: 5
fold_index: null # optional: run only one fold (debug/HPC)
shuffle: true
random_state: 42
stratify_target: null | "scarHRD" | "TMB" | "CLOVAR_D" | ...
stratify_bins: 5 # for regression stratification via quantile binning (optional)
Output layout (example):

output_dir/

fold_0/

model.ckpt
metrics.json (or yaml)
fold_split.json (patient ids / indices)
zscore_stats.pt (or similar)

fold_1/
...
crossval_summary.json (aggregate mean/std + metadata)
</new_config_requirements>

<implementation_guidelines>

First, locate the original STAMP cross-validation implementation in the repo (search for: crossval, kfold, GroupKFold, StratifiedKFold, split, fold, patient). Mirror its conventions (seeding, outputs, CLI patterns) where feasible.
Keep changes minimal and well-contained:

Either add a new runner entrypoint (e.g. train_multitask_crossval) OR add a crossval mode inside train_multitask gated by config.

Use deterministic seeds per fold: e.g. seed + fold_id.
Use patient-level splitting:

Prefer GroupKFold by patient id.
If stratification is enabled for regression, implement a robust approach (e.g. bin the selected target into quantiles, then use a grouped/stratified strategy). If too complex, document limitations and implement non-stratified GroupKFold first.

Logging/metrics:

Log overall val_loss and (if available) per-head losses for each fold.

Fix the beartype warning properly:

Option A: make dataset/collate return (x, y) tuples consistently.
Option B: change type hints to accept Sequence[Tensor] or tuple[Tensor, ...] | list[Tensor].
Add a small runtime assertion to ensure expected structure (length == 2, types are Tensor/dict as expected).
</implementation_guidelines>

<beartype_issue_log>
Warnings like:

LitAttnMILMultiTask._step() parameter batch expected tuple[torch.Tensor, torch.Tensor]
but got list[tensor, ...] in training_step/validation_step
Fix in this branch.
</beartype_issue_log>

<how_to_respond>

Start by listing the relevant files/modules you found (paths) and where you plan to hook cross-validation.
Provide a step-by-step implementation plan.
Then provide concrete diffs / code changes file-by-file (include new files too).
Finish with exact commands to run:

how to run single-fold debug
how to run full CV
how to verify outputs and correctness (no leakage, no patient overlap)
If something is unclear/missing in the repo, ask targeted questions, but proceed with a reasonable default design (GroupKFold by patient_label) so progress continues.
</how_to_respond>
Do not output a bibliography or list of sources at the end.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits February 8, 2026 11:52
…batch type hint

Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>
… for reproducibility

Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>
Copilot AI changed the title [WIP] Integrate K-fold cross-validation into multitask training pipeline Add multitask training pipeline with K-fold cross-validation Feb 8, 2026
Copilot AI requested a review from drgmo February 8, 2026 12:00
Copilot AI and others added 2 commits February 8, 2026 13:33
Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>
…sertion

Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>
Copilot AI changed the title Add multitask training pipeline with K-fold cross-validation Add patient-preds.csv output to multitask cross-validation folds Feb 8, 2026
…et names

Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>
Copilot AI changed the title Add patient-preds.csv output to multitask cross-validation folds Integrate multitask attention-MIL training with K-fold cross-validation Feb 8, 2026
…ets)

Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>
Copilot AI changed the title Integrate multitask attention-MIL training with K-fold cross-validation Multitask attention-MIL training with K-fold cross-validation Feb 8, 2026
Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>
Copilot AI changed the title Multitask attention-MIL training with K-fold cross-validation Add multitask attention-MIL training with K-fold cross-validation Feb 8, 2026
Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>
Copilot AI and others added 2 commits February 8, 2026 17:18
Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>
Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>
Copilot AI changed the title Add multitask attention-MIL training with K-fold cross-validation Add multitask attention-MIL training with K-fold cross-validation and binary evaluation helpers Feb 8, 2026
Copilot AI and others added 2 commits February 8, 2026 17:34
Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>
…k_scores param

Co-authored-by: drgmo <65294284+drgmo@users.noreply.github.com>
Copilot AI changed the title Add multitask attention-MIL training with K-fold cross-validation and binary evaluation helpers Add multitask attention-MIL training with K-fold cross-validation, binary evaluation helpers, and survival risk stratification Feb 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants