Hg model training with torch DataLoader#1445
Hg model training with torch DataLoader#1445annasun28 wants to merge 13 commits intorsy/hg-model-v1from
Conversation
There was a problem hiding this comment.
This looks good. Just wondering if the process granularity is handled by the RecipeModel class during loading. I put barrier() in there because occasionally model info would be on another process and it would not recognize from interleaving.
There was a problem hiding this comment.
Hm, in the seamless_next recipe, I called model = model_hub.load_model(card, config=hg_model_config) before setting up gangs, which I guess is why this was causing an error. Let me try initializing the gangs first and see. But just curious, why are barrier() needed for this HF model, but not needed for others?
There was a problem hiding this comment.
From what I understand, the _load_special_model in factory sets a specific config as opposed to using AutoConfig. Qwen2.5-Omni does not accept AutoConfig because it is sort of special purpose. I noticed with torchrun that sometimes the main process would correctly set the config and others would not, leading to an error: "Invalid model config. Select from BertConfig, LlamaConfig, ..." on the other handles. I assumed this was because they were not properly synced, so thought that letting them all reach that point would resolve it. But it seems like you grab the config directly first in your recipe. I loaded from the card and let it find the config itself.
|
Currently working on cleaning this up so it can be merged. ETA is end of next week (around 12/5). |
What does this PR do? Please describe:
transformers/accelerate). Note: to override transformer_cls_names_to_wrap, one needs to register a new model family (see the corresponding seamless_next PR for example)Tested with https://github.com/fairinternal/seamless_next/pull/750
Does your PR introduce any breaking changes? If yes, please list them:
N/A
Check list: