-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Description
I initially thought bev_embed_map refers to embedded map features, but after carefully reading the code, I believe it is actually derived from image features.
🔄 Workflow Summary
1. Input:
mlvl_feats (image features from backbone)
2. Processing:
Image features are flattened and passed to the map encoder:
mlvl_feats -> feat_flatten -> (self.map_encoder()) -> bev_embed_map
3. Masking using local map:
After bev_embed_map is computed, it is masked using the local map:
local_map_mask, _ = torch.max(local_map, dim=-1)
local_map_mask = local_map_mask.permute(1, 0)
local_map_mask = local_map_mask < 0.8
bev_embed_map[local_map_mask] = 0.0
4. Fusion:
bev_embed_map is added to bev_embed, then local_map is appended as additional channels:
bev_embed = bev_embed + bev_embed_map
if local_map is not None:
bev_embed = bev_embed.view(bs, bev_h, bev_w, -1).permute(0, 3, 1, 2).contiguous()
local_map_reshape = local_map.permute(1, 0, 2).view(bs, bev_h, bev_w, -1).permute(0, 3, 1, 2).contiguous()
bev_fuse = torch.cat((bev_embed, local_map_reshape), dim=1)
bev_embed = self.map_feats_conv(bev_fuse)
bev_embed = bev_embed.flatten(2).permute(0, 2, 1).contiguous()
Please let me know if my understanding is correct. If so, which variable are the embedded map features?
Thanks!
Metadata
Metadata
Assignees
Labels
No labels