Add CFPO objective to GRPO trainer by asparius · Pull Request #5027 · huggingface/trl

asparius · 2026-02-09T13:38:27Z

Add CFPO objective to GRPO trainer

Summary

Adds support for the CFPO (Clipping-Free Policy Optimization) objective in the GRPO trainer.

CFPO replaces PPO-style ratio clipping with a smooth quadratic penalty, removing zero-gradient regions while maintaining stable updates. The objective is fully differentiable and can be used as a drop-in alternative to clipping-based optimization.

Changes

Add CFPO loss option to GRPO trainer
Replace ratio clipping with quadratic penalty
No additional hyperparameters
Fully backward compatible (opt-in, default unchanged)

Reference

https://arxiv.org/abs/2601.22801

Who can review?

@qgallouedec @kashif @albertvillanova

casinca · 2026-02-18T19:07:18Z

Hello, the implementation looks good, not sure @qgallouedec will keep the docstring tho, but pretty sure you'll need to add before merging/for the reviewers:

an entry with your paper in docs/source/paper_index.md
add your loss in @pytest.mark.parametrize for def test_training_loss_types in tests/test_grpo_trainer.py

Adding CFPO loss to GRPO Trainer

afa266b

casinca mentioned this pull request Feb 16, 2026

Add PSPO trust region method as alternative to clipping in GRPOTrainer #4548

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CFPO objective to GRPO trainer#5027

Add CFPO objective to GRPO trainer#5027
asparius wants to merge 1 commit intohuggingface:mainfrom
asparius:main

asparius commented Feb 9, 2026 •

edited

Loading

Uh oh!

casinca commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

asparius commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!