Skip to content

Add CFPO objective to GRPO trainer#5027

Open
asparius wants to merge 1 commit intohuggingface:mainfrom
asparius:main
Open

Add CFPO objective to GRPO trainer#5027
asparius wants to merge 1 commit intohuggingface:mainfrom
asparius:main

Conversation

@asparius
Copy link
Contributor

@asparius asparius commented Feb 9, 2026

Add CFPO objective to GRPO trainer

Summary

Adds support for the CFPO (Clipping-Free Policy Optimization) objective in the GRPO trainer.

CFPO replaces PPO-style ratio clipping with a smooth quadratic penalty, removing zero-gradient regions while maintaining stable updates. The objective is fully differentiable and can be used as a drop-in alternative to clipping-based optimization.

Changes

  • Add CFPO loss option to GRPO trainer
  • Replace ratio clipping with quadratic penalty
  • No additional hyperparameters
  • Fully backward compatible (opt-in, default unchanged)

Reference

https://arxiv.org/abs/2601.22801

Who can review?

@qgallouedec @kashif @albertvillanova

@casinca
Copy link
Contributor

casinca commented Feb 18, 2026

Hello, the implementation looks good, not sure @qgallouedec will keep the docstring tho, but pretty sure you'll need to add before merging/for the reviewers:

  • an entry with your paper in docs/source/paper_index.md
  • add your loss in @pytest.mark.parametrize for def test_training_loss_types in tests/test_grpo_trainer.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants