Skip to content

Hg model training with torch DataLoader#1445

Closed
annasun28 wants to merge 13 commits intorsy/hg-model-v1from
ays/hg-model-v2
Closed

Hg model training with torch DataLoader#1445
annasun28 wants to merge 13 commits intorsy/hg-model-v1from
ays/hg-model-v2

Conversation

@annasun28
Copy link

@annasun28 annasun28 commented Nov 14, 2025

What does this PR do? Please describe:

  • Update fairseq2 HgTokenizer with some methods required from the base transformers Tokenizer. TBD: whether we should actually subclass and override methods so that we automatically fall back to the base transformers Tokenizer methods?
  • Enable FSDP for Hg model (matching the way FSDP wrapping is handled in transformers / accelerate). Note: to override transformer_cls_names_to_wrap, one needs to register a new model family (see the corresponding seamless_next PR for example)
  • Fix this issue:

Tested with https://github.com/fairinternal/seamless_next/pull/750

Does your PR introduce any breaking changes? If yes, please list them:
N/A

Check list:

  • Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
  • Did you read the contributor guideline?
  • Did you make sure that your PR does only one thing instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 14, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Just wondering if the process granularity is handled by the RecipeModel class during loading. I put barrier() in there because occasionally model info would be on another process and it would not recognize from interleaving.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, in the seamless_next recipe, I called model = model_hub.load_model(card, config=hg_model_config) before setting up gangs, which I guess is why this was causing an error. Let me try initializing the gangs first and see. But just curious, why are barrier() needed for this HF model, but not needed for others?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, the _load_special_model in factory sets a specific config as opposed to using AutoConfig. Qwen2.5-Omni does not accept AutoConfig because it is sort of special purpose. I noticed with torchrun that sometimes the main process would correctly set the config and others would not, leading to an error: "Invalid model config. Select from BertConfig, LlamaConfig, ..." on the other handles. I assumed this was because they were not properly synced, so thought that letting them all reach that point would resolve it. But it seems like you grab the config directly first in your recipe. I loaded from the card and let it find the config itself.

@rsyue
Copy link
Contributor

rsyue commented Nov 25, 2025

Currently working on cleaning this up so it can be merged. ETA is end of next week (around 12/5).

@rsyue rsyue marked this pull request as ready for review November 26, 2025 19:21
@annasun28 annasun28 closed this Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants