Fix GRPO tool mask alignment after tool-call retokenization by MichalMraz · Pull Request #5145 · huggingface/trl

MichalMraz · 2026-02-21T21:42:22Z

What does this PR do?

Fixes #5144

This PR fixes a shape-mismatch bug in GRPOTrainer tool-call flow.

Root cause:

During _tool_call_loop, tool-round retokenization can make the completion part shorter than the previous completion.
tool_mask/logprobs were only extended, not truncated, so they could become longer than completion_ids.
This later crashes in _compute_loss when multiplying completion_mask * tool_mask.

Fix:

In trl/trainer/grpo_trainer.py, align tool_mask and logprobs to computed completion lengths by truncating or padding as needed before appending post-tool tokens.

Tests:

Added regression test:
tests/test_grpo_trainer.py::test_training_with_tools_keeps_masks_aligned_when_retokenization_shortens_completion
This test fails on old code with the tensor-size mismatch and passes with this patch.

chatgpt-codex-connector · 2026-02-21T21:42:27Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

qgallouedec · 2026-02-21T21:44:19Z

During _tool_call_loop, tool-round retokenization can make the completion part shorter than the previous completion.

shouldn't happen, what model do you use?

MichalMraz · 2026-02-21T21:49:19Z

I originally hit it with Qwen3-32B on transformers==5.0.0.
The test uses a tiny Qwen model only

qgallouedec · 2026-02-21T21:51:22Z

ok, can you share what are the generated completion ids / tools calls when if occurs?

MichalMraz · 2026-02-21T22:25:17Z

For example this kind of model response causes it

assistant_text = (
        "<tool_call>\n"
        "{\n"
        '                                  "name"        :                 "multiply_tool",\n'
        '                                  "arguments"   :                 {\n'
        '                                                    "a"   :  3,\n'
        '                                                    "b"   :  4\n'
        "                                                }\n"
        "}\n"
        f"</tool_call>{tokenizer.eos_token}"
    )

The completion_ids are then
[151657, 198, 515, 6656, 330, 606, 1, 286, 549, 338, 330, 64648, 22785, 756, 6656, 330, 16370, 1, 256, 549, 338, 341, 6374, 330, 64, 1, 256, 549, 220, 220, 18, 345, 6374, 330, 65, 1, 256, 549, 220, 220, 19, 198, 4569, 456, 532, 151658, 151645]
and parsed tool_calls
[{'type': 'function', 'function': {'name': 'multiply_tool', 'arguments': {'a': 3, 'b': 4}}}]

Here

original completion length = 47
retokenized completion_tool length = 41

So old logic computes negative delta (41 - 47 = -6).

Yes, it is a somewhat degenerate case but it sometimes happens during unstable training, causing a crash

Fix GRPO tool mask alignment after tool-call retokenization

f0dd05f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GRPO tool mask alignment after tool-call retokenization#5145

Fix GRPO tool mask alignment after tool-call retokenization#5145
MichalMraz wants to merge 1 commit intohuggingface:mainfrom
MichalMraz:mmraz-fix

MichalMraz commented Feb 21, 2026

Uh oh!

chatgpt-codex-connector bot commented Feb 21, 2026

Uh oh!

qgallouedec commented Feb 21, 2026

Uh oh!

MichalMraz commented Feb 21, 2026

Uh oh!

qgallouedec commented Feb 21, 2026 •

edited

Loading

Uh oh!

MichalMraz commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MichalMraz commented Feb 21, 2026

What does this PR do?

Uh oh!

chatgpt-codex-connector bot commented Feb 21, 2026

Uh oh!

qgallouedec commented Feb 21, 2026

Uh oh!

MichalMraz commented Feb 21, 2026

Uh oh!

qgallouedec commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MichalMraz commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qgallouedec commented Feb 21, 2026 •

edited

Loading