Skip to content

Add support for DPPO [WIP]#5065

Draft
catherinelee274 wants to merge 3 commits intohuggingface:mainfrom
catherinelee274:clee_dppo
Draft

Add support for DPPO [WIP]#5065
catherinelee274 wants to merge 3 commits intohuggingface:mainfrom
catherinelee274:clee_dppo

Conversation

@catherinelee274
Copy link

What does this PR do?

Fixes #4998

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@catherinelee274 catherinelee274 changed the title Add support for DPPO Add support for DPPO [WIP] Feb 11, 2026
@@ -0,0 +1,164 @@
# DPPO Trainer

TRL supports the Decoupled Proximal Policy Optimization (DPPO) algorithm, which is a variant of PPO that decouples the optimization of the policy and value function for improved training stability. This implementation is based on the [Stable-RL](https://github.com/sail-sg/Stable-RL) paper.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is "Divergence Proximal Policy Optimization".

TRL supports the Divergence Proximal Policy Optimization (DPPO) algorithm, which is a variant of PPO that substitutes heuristic clipping with a more principled constraint based on a direct estimate of policy divergence (e.g., Total Variation or KL) for improved training efficiency and stability.

Copy link
Author

@catherinelee274 catherinelee274 Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TRL has a separate PR for this. I will be closing in favor of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DPPO - Divergence Proximal Policy Optimization

2 participants