Add support for DPPO [WIP] by catherinelee274 · Pull Request #5065 · huggingface/trl

catherinelee274 · 2026-02-10T20:22:26Z

What does this PR do?

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

QPHutu · 2026-02-27T10:19:04Z

docs/source/dppo_trainer.md

@@ -0,0 +1,164 @@
+# DPPO Trainer
+
+TRL supports the Decoupled Proximal Policy Optimization (DPPO) algorithm, which is a variant of PPO that decouples the optimization of the policy and value function for improved training stability. This implementation is based on the [Stable-RL](https://github.com/sail-sg/Stable-RL) paper.


It is "Divergence Proximal Policy Optimization".

TRL supports the Divergence Proximal Policy Optimization (DPPO) algorithm, which is a variant of PPO that substitutes heuristic clipping with a more principled constraint based on a direct estimate of policy divergence (e.g., Total Variation or KL) for improved training efficiency and stability.

TRL has a separate PR for this. I will be closing in favor of that.

Add support for DPPO

6fd16bc

catherinelee274 changed the title ~~Add support for DPPO~~ Add support for DPPO [WIP] Feb 11, 2026

Add tests for dppo

e41d28b

LeonEricsson mentioned this pull request Feb 18, 2026

feat(experimental): Divergence Proximal Policy Optimization #5117

Open

5 tasks

Update due to wrong name

6c70714

QPHutu reviewed Feb 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for DPPO [WIP]#5065

Add support for DPPO [WIP]#5065
catherinelee274 wants to merge 3 commits intohuggingface:mainfrom
catherinelee274:clee_dppo

catherinelee274 commented Feb 10, 2026

Uh oh!

QPHutu Feb 27, 2026

Uh oh!

catherinelee274 Feb 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,164 @@
		# DPPO Trainer

		TRL supports the Decoupled Proximal Policy Optimization (DPPO) algorithm, which is a variant of PPO that decouples the optimization of the policy and value function for improved training stability. This implementation is based on the [Stable-RL](https://github.com/sail-sg/Stable-RL) paper.

Conversation

catherinelee274 commented Feb 10, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

QPHutu Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

catherinelee274 Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

catherinelee274 Feb 27, 2026 •

edited

Loading