Fix mRoPE position ID crash on Qwen2-VL prompt truncation #482

Mr-Neutr0n · 2026-02-09T22:00:18Z

Summary

Fixes #441

When training Qwen2.5-VL with agent-lightning + verl, the model crashes in get_rope_index with a shape mismatch:

position_ids[..., attention_mask == 1] = llm_positions

fails because llm_positions length differs from the attention mask true-count.

Root cause: In get_train_data_batch, prompt truncation (prompt_ids[:max_prompt_length]) changes the token count, potentially removing image placeholder tokens. However, image_grid_thw is computed from the original (untruncated) image_urls list. When get_rope_index processes the truncated sequence, it finds fewer <|vision_start|><|image_pad|> regions than image_grid_thw entries, causing the position ID length to diverge from the attention mask count.

Fix: After prompt truncation, count the remaining image regions in the truncated token sequence using the same vision_start_token_id + image_token_id pattern that get_rope_index uses, and slice image_urls to match before computing image_grid_thw.

Added _count_images_in_tokens() helper method to detect image regions in token sequences
Modified the transition-level mRoPE code path to reconcile image_urls with truncated prompts

Test plan

Verify Qwen2.5-VL training with prompts that exceed max_prompt_length and contain images no longer crashes in get_rope_index
Verify Qwen2.5-VL training with prompts shorter than max_prompt_length is unaffected (no truncation, all images retained)
Verify non-VL model training paths are unaffected (_use_mrope is False)

When training Qwen2.5-VL with agent-lightning + verl, prompt truncation changes the token count but image_grid_thw is computed from the original (untruncated) image_urls. This causes get_rope_index to fail with a shape mismatch because it finds fewer image tokens in the truncated input_ids than entries in image_grid_thw. After prompt truncation, count remaining image regions in the truncated token sequence and slice image_urls to match before computing image_grid_thw, ensuring consistency between the token content and the mRoPE spatial metadata. Fixes microsoft#441

Mr-Neutr0n · 2026-02-12T18:11:42Z

Friendly bump! Let me know if there's anything I should update or improve to help move this forward.

Mr-Neutr0n force-pushed the fix/qwen-vl-mrope-truncation branch from bdd1c8d to ca0be5a Compare February 9, 2026 22:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix mRoPE position ID crash on Qwen2-VL prompt truncation #482

Fix mRoPE position ID crash on Qwen2-VL prompt truncation #482

Uh oh!

Mr-Neutr0n commented Feb 9, 2026

Uh oh!

Mr-Neutr0n commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix mRoPE position ID crash on Qwen2-VL prompt truncation #482

Are you sure you want to change the base?

Fix mRoPE position ID crash on Qwen2-VL prompt truncation #482

Uh oh!

Conversation

Mr-Neutr0n commented Feb 9, 2026

Summary

Test plan

Uh oh!

Mr-Neutr0n commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant