Jacklanchantin/verifiable nonverifiable by jacklanchantin · Pull Request #1173 · facebookresearch/fairseq2

jacklanchantin · 2025-05-09T02:00:55Z

What does this PR do? Please describe:
A summary of the change or the issue that is fixed.

Fixes #{issue number}

right now, we pass all prompts to the LLM reward model for synchronization. need to optimize it. e.g.:

in SuperReward:

collect all prompts and task types on rank 0
depending on task type submit all prompts to their rewards
carefully scatter that back to their ranks

the biggest mismatch is that non-LLM rewards all run in parallel, but with LLM ones we have to do collective call, so here we have to unify that

Does your PR introduce any breaking changes? If yes, please list them:
List of all backwards-incompatible changes.

Check list:

Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
Did you read the contributor guideline?
Did you make sure that your PR does only one thing instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

…2 into jacklanchantin/verifiable_nonverifiable

jacklanchantin · 2025-05-09T18:31:40Z

src/fairseq2/recipes/lm/_online_finetune/_rewards.py

+        # dummy vllm call for syncronization # FIXME
+        # if self.vllm_math_reward_model is not None:
+        log.info("MathVerifyVerifir generate_rewards()")
+        dummy_rewards = generate_rewards(


if i check for if self.vllm_math_reward_model is not None which it will be None for non combination runs, then i get an nccl error. not sure what's happening there

…okresearch/fairseq2 into jacklanchantin/verifiable_nonverifiable

…2 into jacklanchantin/verifiable_nonverifiable

jacklanchantin added 4 commits May 2, 2025 23:09

change var name

18dace7

check if self._step_nr exists when syncing (#1160)

c6fc816

force_sync bug fix

7df6271

2 verifiers

c27b822

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 9, 2025

brute force vllm reward synchronization

e22fbed

jacklanchantin changed the base branch from main to online_training May 9, 2025 05:34

jacklanchantin added 8 commits May 9, 2025 14:40

working ckpt

fcb6755

reward mapper

76c6775

superclass

0c9d658

change class

cc0d6b1

Merge branch 'online_training' of github.com:facebookresearch/fairseq…

da29581

…2 into jacklanchantin/verifiable_nonverifiable

cleanup

a63bbd4

cleanup

5db1f99

reorder

d135ee5

jacklanchantin commented May 9, 2025

View reviewed changes

jacklanchantin added 14 commits May 9, 2025 19:34

log

dc72481

separate val logging

52563e9

Merge branch 'jacklanchantin/normalized_rewards' of github.com:facebo…

4e425c9

…okresearch/fairseq2 into jacklanchantin/verifiable_nonverifiable

log tasks during training

31200d3

bug

c7328a8

comm sleep

74a4f85

new ray.get method for reward model

fbf29f0

mult rewards

5aad554

comment

754fd82

remove math tokenizer/wrapper

dabea7a

log

9fe1a89

logs

59cf67b

force athene verifier if batch mismatch

e6debd5

neurips checkpoint

1a9ea63

jacklanchantin added 8 commits May 21, 2025 19:28

merge

49c31c2

merge

e6d2146

merge

a790b85

typo

ee71556

merge fixes

924a02e

Merge branch 'online_training' of github.com:facebookresearch/fairseq…

0a1d1d4

…2 into jacklanchantin/verifiable_nonverifiable

Merge branch 'online_training' of github.com:facebookresearch/fairseq…

df7325f

…2 into jacklanchantin/verifiable_nonverifiable

cant throw error for mixed tasks yet

692f79a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jacklanchantin/verifiable nonverifiable#1173

Jacklanchantin/verifiable nonverifiable#1173
jacklanchantin wants to merge 35 commits intoonline_trainingfrom
jacklanchantin/verifiable_nonverifiable

jacklanchantin commented May 9, 2025

Uh oh!

jacklanchantin May 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jacklanchantin commented May 9, 2025

Uh oh!

jacklanchantin May 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants