Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 7, 2026

Updates the "Video Chunk Sampling" section in docs/model_card.md to reflect the actual API, which uses patch_positions instead of visible_indices.

Changes

  • Updated flow diagram to reference patch_positions for temporal mapping
  • Renamed mechanism section from visible_indices to patch_positions
  • Rewrote code example to show building [batch_size, seq_len, 3] tensor with [t, h, w] coordinates

Example

# Build patch_positions: [batch_size, seq_len, 3] with [t, h, w] for each patch
t_positions = frame_pos.unsqueeze(-1).expand(-1, frame_tokens).reshape(-1)
h_positions = h_ids.unsqueeze(0).expand(num_frames, -1).reshape(-1)
w_positions = w_ids.unsqueeze(0).expand(num_frames, -1).reshape(-1)

patch_positions = torch.stack([t_positions, h_positions, w_positions], dim=-1)
patch_positions = patch_positions.unsqueeze(0)  # [1, 4096, 3]
Original prompt

Please update the "Video Chunk Sampling" section in docs/model_card.md first, changing the input method from visible_indices to patch_positions.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: anxiangsir <31175974+anxiangsir@users.noreply.github.com>
Copilot AI changed the title [WIP] Update video chunk sampling section in model card docs: update Video Chunk Sampling section to use patch_positions Jan 7, 2026
Copilot AI requested a review from anxiangsir January 7, 2026 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants