Hg model training with torch DataLoader by annasun28 · Pull Request #1445 · facebookresearch/fairseq2

annasun28 · 2025-11-14T19:58:03Z

What does this PR do? Please describe:

Update fairseq2 HgTokenizer with some methods required from the base transformers Tokenizer. TBD: whether we should actually subclass and override methods so that we automatically fall back to the base transformers Tokenizer methods?
Enable FSDP for Hg model (matching the way FSDP wrapping is handled in transformers / accelerate). Note: to override transformer_cls_names_to_wrap, one needs to register a new model family (see the corresponding seamless_next PR for example)
Fix this issue:
- fs2 overrides model.state_dict in _FSDP1Model to return fsdp1_local_state_dict (e.g. when checkpointing). If the tensor is sharded unevenly (e.g. if you have a nn.Embedding(2, dim) and sdp.size=4), the key is skipped (https://github.com/facebookresearch/fairseq2/blob/main/src/fairseq2/nn/fsdp/fsdp1.py#L192-L193) which causes a hang when reloading (IIUC due to an allgather in the FSDP pre state dict hook)

Tested with https://github.com/fairinternal/seamless_next/pull/750

Does your PR introduce any breaking changes? If yes, please list them:
N/A

Check list:

Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
Did you read the contributor guideline?
Did you make sure that your PR does only one thing instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

…el-v2

rsyue · 2025-11-18T19:45:27Z

src/fairseq2/models/hg/factory.py

This looks good. Just wondering if the process granularity is handled by the RecipeModel class during loading. I put barrier() in there because occasionally model info would be on another process and it would not recognize from interleaving.

Hm, in the seamless_next recipe, I called model = model_hub.load_model(card, config=hg_model_config) before setting up gangs, which I guess is why this was causing an error. Let me try initializing the gangs first and see. But just curious, why are barrier() needed for this HF model, but not needed for others?

From what I understand, the _load_special_model in factory sets a specific config as opposed to using AutoConfig. Qwen2.5-Omni does not accept AutoConfig because it is sort of special purpose. I noticed with torchrun that sometimes the main process would correctly set the config and others would not, leading to an error: "Invalid model config. Select from BertConfig, LlamaConfig, ..." on the other handles. I assumed this was because they were not properly synced, so thought that letting them all reach that point would resolve it. But it seems like you grab the config directly first in your recipe. I loaded from the card and let it find the config itself.

rsyue · 2025-11-25T19:29:26Z

Currently working on cleaning this up so it can be merged. ETA is end of next week (around 12/5).

…GPU present.

wip with torch DataLoader

8a3dd41

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 14, 2025

annasun28 added 4 commits November 15, 2025 00:18

enable fsdp for hg model

d9d1545

revert DataPipelineReader changes

da929c3

Merge remote-tracking branch 'origin/rsy/hg-model-v1' into ays/hg-mod…

7b61ccd

…el-v2

fix fsdp issue

70ee448

rsyue reviewed Nov 18, 2025

View reviewed changes

annasun28 added 3 commits November 19, 2025 20:44

update fsdp for v1+v2

c49d030

lint

1a9b1a3

lint/format

37266b4

Added gangs to the model loading args in test

71bb1b8

rsyue marked this pull request as ready for review November 26, 2025 19:21

rsyue requested review from MartinGleize, cbalioglu, cirquit and zyaoj as code owners November 26, 2025 19:21

rsyue added 4 commits November 26, 2025 19:25

Cleaned up comment tags

d8d1958

Removed print debug statements

6fcddc7

Updated unit tests to include model sharding - runs if more than one …

c0a6f07

…GPU present.

Improved readability of hardware test

274f505

annasun28 closed this Feb 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hg model training with torch DataLoader#1445

Hg model training with torch DataLoader#1445
annasun28 wants to merge 13 commits intorsy/hg-model-v1from
ays/hg-model-v2

annasun28 commented Nov 14, 2025 •

edited

Loading

Uh oh!

rsyue Nov 18, 2025

Uh oh!

annasun28 Nov 19, 2025

Uh oh!

rsyue Nov 21, 2025

Uh oh!

rsyue commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

annasun28 commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rsyue Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

annasun28 Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

rsyue Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

rsyue commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

annasun28 commented Nov 14, 2025 •

edited

Loading