-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Hello! Thank you for this excellent work on FastSAM3D.
I'm trying to use the FastSAM3D checkpoint from HuggingFace (https://huggingface.co/techlove/FastSAM3D, first download link "FASTSAM3D") with the code from this repository, but encountering a tensor dimension mismatch error.
Environment:
- PyTorch 2.6
- Repository: https://github.com/arcadelab/FastSAM3D (cloned Nov 2024)
- Checkpoint:
fastsam3d.pthfrom HuggingFace (first download link)
Error:
RuntimeError: The size of tensor a (768) must match the size of tensor b (8) at non-singleton dimension 5
File "segment_anything/modeling/mask_decoder3D.py", line 407, in predict_masks
src = src + dense_prompt_embeddings
What I've tried:
Confirmed checkpoint has 6-layer encoder (student model)
Built model with 6-layer ImageEncoderViT3D
Used vit_b_ori model type (as specified in checkpoint args)
Loaded with strict=False
The error occurs in the mask decoder, not the encoder
Checkpoint inspection shows:
pythonargs.model_type = 'vit_b_ori'
args.checkpoint = './work_dir/SAM/sam_med3d_oringin.pth' # Teacher checkpoint
Encoder has 6 blocks (layers) - confirmed
Questions:
Does the HuggingFace checkpoint match the current repository code, or was it trained with a modified version?
Which specific commit/branch should be used with this checkpoint?
Are there additional architectural modifications needed beyond the 6-layer encoder?
I noticed Issue #6 mentions different attention mechanisms (woatt vs flash attention). Could this be related?Any guidance would be greatly appreciated!