Skip to content

Checkpoint incompatibility with repository code - tensor dimension mismatch #10

@junming732

Description

@junming732

Hello! Thank you for this excellent work on FastSAM3D.

I'm trying to use the FastSAM3D checkpoint from HuggingFace (https://huggingface.co/techlove/FastSAM3D, first download link "FASTSAM3D") with the code from this repository, but encountering a tensor dimension mismatch error.

Environment:

Error:

RuntimeError: The size of tensor a (768) must match the size of tensor b (8) at non-singleton dimension 5
  File "segment_anything/modeling/mask_decoder3D.py", line 407, in predict_masks
    src = src + dense_prompt_embeddings

What I've tried:

Confirmed checkpoint has 6-layer encoder (student model)
Built model with 6-layer ImageEncoderViT3D
Used vit_b_ori model type (as specified in checkpoint args)
Loaded with strict=False
The error occurs in the mask decoder, not the encoder

Checkpoint inspection shows:
pythonargs.model_type = 'vit_b_ori'
args.checkpoint = './work_dir/SAM/sam_med3d_oringin.pth' # Teacher checkpoint
Encoder has 6 blocks (layers) - confirmed
Questions:

Does the HuggingFace checkpoint match the current repository code, or was it trained with a modified version?
Which specific commit/branch should be used with this checkpoint?
Are there additional architectural modifications needed beyond the 6-layer encoder?

I noticed Issue #6 mentions different attention mechanisms (woatt vs flash attention). Could this be related?Any guidance would be greatly appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions