fix: guard against division by zero in GPTRewardModel with empty batches by themavik · Pull Request #610 · CarperAI/trlx

themavik · 2026-02-11T07:09:56Z

Summary

GPTRewardModel.forward() crashes with a ZeroDivisionError when the input batch has 0 or 1 samples:

bs = input_ids.shape[0] // 2  # 0 when shape[0] < 2
...
loss = loss / bs  # ZeroDivisionError

Additionally, torch.stack(chosen_end_scores) fails with RuntimeError on the empty list when bs == 0.

Changes

Add an early return guard after the batch size calculation. When bs == 0, the function returns:

loss: zero tensor on the input device
chosen_end_scores: empty tensor on the input device
rejected_end_scores: empty tensor on the input device

Test Plan

Batch with 2+ samples (normal paired input): behavior unchanged
Batch with 1 sample: returns zero loss and empty scores instead of crashing
Batch with 0 samples: returns zero loss and empty scores instead of crashing

Fixes #609

When the input batch has 0 or 1 samples, bs = input_ids.shape[0] // 2 evaluates to 0, causing ZeroDivisionError at loss = loss / bs and RuntimeError at torch.stack on empty lists. Add an early return when bs == 0, returning zero loss and empty score tensors on the correct device. Fixes CarperAI#609 Co-authored-by: Cursor <cursoragent@cursor.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: guard against division by zero in GPTRewardModel with empty batches#610

fix: guard against division by zero in GPTRewardModel with empty batches#610
themavik wants to merge 1 commit intoCarperAI:mainfrom
themavik:fix-reward-model-empty-batch

themavik commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

themavik commented Feb 11, 2026

Summary

Changes

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments