Skip to content

Add SAM2 few-shot segmentation worker#127

Draft
arjunrajlab wants to merge 3 commits intomasterfrom
claude/add-sam2-segmentation-3vOWk
Draft

Add SAM2 few-shot segmentation worker#127
arjunrajlab wants to merge 3 commits intomasterfrom
claude/add-sam2-segmentation-3vOWk

Conversation

@arjunrajlab
Copy link
Collaborator

Summary

This PR introduces a new SAM2 few-shot segmentation worker that enables automatic polygon segmentation based on training annotations. The worker uses SAM2's image encoder to extract feature vectors from training examples, creates a prototype representation, and then applies similarity-based filtering to automatically segment similar objects in other frames.

Key Changes

  • New Worker Implementation (workers/annotations/sam2_fewshot_segmentation/)

    • entrypoint.py: Main worker logic implementing two-phase pipeline:
      • Phase 1: Extract and average feature vectors from training annotations to create a prototype
      • Phase 2: Generate candidate masks using SAM2's automatic mask generator, filter by cosine similarity to prototype, and convert to polygon annotations
    • environment.yml: Conda environment with required dependencies (Python 3.10, SAM2, scikit-image, etc.)
    • Dockerfile and Dockerfile_M1: Container definitions for standard and ARM64 architectures
  • Build Configuration

    • Updated build_machine_learning_workers.sh to include SAM2 few-shot segmentation worker build step
  • Comprehensive Test Suite (workers/annotations/sam2_fewshot_segmentation/tests/)

    • Unit tests for core utility functions:
      • extract_crop_with_context: Context-aware crop extraction with occupancy targeting
      • pool_features_with_mask: Weighted feature pooling using binary masks
      • ensure_rgb: Image format normalization
      • annotation_to_mask: Polygon to binary mask conversion
      • interface: Worker interface configuration

Notable Implementation Details

  • Feature Extraction: Uses SAM2's image encoder (image_embed) at 256 channels, 64x64 resolution for semantic feature representation
  • Prototype Learning: Averages normalized feature vectors from all training annotations to create a robust prototype
  • Similarity Filtering: Applies cosine similarity threshold to filter candidate masks, with configurable threshold (default 0.5)
  • Context-Aware Cropping: Extracts crops with configurable occupancy ratio to provide appropriate context for feature extraction
  • Configurable Parameters: User-facing interface includes model selection, similarity threshold, target occupancy, points per side, mask area bounds, and smoothing
  • Batch Processing: Supports processing multiple XY, Z, and Time coordinates in a single run
  • Tag-Based Training: Supports filtering training annotations by tags with exclusive/inclusive modes

https://claude.ai/code/session_01SiA9ktGYkhfqo1c4YBqPKw

Implements a NimbusImage annotation worker that segments objects using
few-shot learning with SAM2. Users annotate 5-20 training examples with
a specific tag, and the worker uses SAM2's image encoder features to find
similar objects across the dataset.

Key design:
- Phase 1: Extract SAM2 features from training annotations using
  mask-weighted pooling, averaged into a single prototype vector
- Phase 2: Run SAM2 automatic mask generator on inference images,
  then filter candidates by cosine similarity to the prototype
- Context padding ensures objects occupy ~20% of crop area for
  consistent feature extraction between training and inference
- Uses SAM2ImagePredictor for proper feature extraction including
  no_mem_embed handling

Interface parameters: Training Tag, Model selection, Similarity Threshold,
Target Occupancy, Points per side, Min/Max Mask Area, Smoothing, and
Batch XY/Z/Time for multi-frame processing.

https://claude.ai/code/session_01SiA9ktGYkhfqo1c4YBqPKw
…ot a dict

The 'type': 'tags' interface field returns a plain list of strings
(e.g., ["DAPI blob"]), not a dict with 'tags' and 'exclusive' keys.
Updated parsing to handle this correctly, matching the pattern used
by other workers (connect_to_nearest, cellpose_train, piscis).

Also added early validation if no training tag is selected.

https://claude.ai/code/session_01SiA9ktGYkhfqo1c4YBqPKw
CLAUDE.md: Added "Interface Parameter Data Types" section documenting
what each interface type returns in params['workerInterface'], with
emphasis on the common pitfall that 'tags' type returns a plain list
of strings, not a dict. Includes correct/incorrect code examples and
patterns for validation and annotation filtering.

SAM2_FEWSHOT.md: Added comprehensive worker documentation covering
algorithm overview, parameter tuning guide, design decisions, and
a TODO list for future work (tiled image support, multiple prototypes,
full-image encoding optimization, negative examples, etc.).

https://claude.ai/code/session_01SiA9ktGYkhfqo1c4YBqPKw
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants