Skip to content

Jacklanchantin/verifiable nonverifiable#1173

Draft
jacklanchantin wants to merge 35 commits intoonline_trainingfrom
jacklanchantin/verifiable_nonverifiable
Draft

Jacklanchantin/verifiable nonverifiable#1173
jacklanchantin wants to merge 35 commits intoonline_trainingfrom
jacklanchantin/verifiable_nonverifiable

Conversation

@jacklanchantin
Copy link
Contributor

What does this PR do? Please describe:
A summary of the change or the issue that is fixed.

Fixes #{issue number}

right now, we pass all prompts to the LLM reward model for synchronization. need to optimize it. e.g.:

in SuperReward:

  1. collect all prompts and task types on rank 0
  2. depending on task type submit all prompts to their rewards
  3. carefully scatter that back to their ranks

the biggest mismatch is that non-LLM rewards all run in parallel, but with LLM ones we have to do collective call, so here we have to unify that

Does your PR introduce any breaking changes? If yes, please list them:
List of all backwards-incompatible changes.

Check list:

  • Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
  • Did you read the contributor guideline?
  • Did you make sure that your PR does only one thing instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 9, 2025
@jacklanchantin jacklanchantin changed the base branch from main to online_training May 9, 2025 05:34
# dummy vllm call for syncronization # FIXME
# if self.vllm_math_reward_model is not None:
log.info("MathVerifyVerifir generate_rewards()")
dummy_rewards = generate_rewards(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i check for if self.vllm_math_reward_model is not None which it will be None for non combination runs, then i get an nccl error. not sure what's happening there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants