-
Notifications
You must be signed in to change notification settings - Fork 12
Feature: Support downloading model weights on-the-fly from HuggingFace (#166) #167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feature: Support downloading model weights on-the-fly from HuggingFace (#166) #167
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #167 +/- ##
==========================================
+ Coverage 90.83% 90.86% +0.03%
==========================================
Files 14 14
Lines 1342 1369 +27
==========================================
+ Hits 1219 1244 +25
- Misses 123 125 +2
🚀 New features to boost your workflow:
|
XkunW
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rohan-uiuc, thanks for opening this, I left a few comments. Another thing worth considering is adding a check in the API to see if a model needs to be downloaded, and if that's the case, only allow the download if the HF cache directory env var is set, so that users wouldn't accidentally download a model to their home directory and use up all the quota
| """ | ||
| launcher_script = ["\n"] | ||
|
|
||
| vllm_args_copy = self.params["vllm_args"].copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is necessary, as the model name should be parsed with launch command not part of --vllm-args
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, model_name is the short name used for config lookup, log directories, and job naming (e.g., llama-3). The --model in vllm_args would allow users to specify the full HF path when downloading from HuggingFace.
I'm open to alternative approaches if you have a preference, like:
Dedicated CLI option (e.g., --hf-model) - keeps model_name as the short identifier, adds explicit option for full HF path
Reuse existing model_name - allow full HF paths directly, but adjust config lookups, log directory structure, etc. to handle paths with /
PR Type
Feature
Short Description
Implements support for on-the-fly model weight downloads from HuggingFace when local model weights directory doesn't exist. This allows users to launch models without manually downloading and mounting weight directories.
The code now checks if the model weights directory exists before attempting to bind mount it. If the directory doesn't exist, it skips the bind mount and uses the model identifier from
--modelin vllm_args (or falls back to model_name). Users must pass the full HuggingFace model identifier (e.g.,Qwen/Qwen2.5-7B-Instruct) via--modelin vllm_args for automatic downloads to work.Fixes #166
Tests Added
test_generate_server_setup_singularity_no_weights: Verifies server setup doesn't include model weights path when directory doesn't existtest_generate_launch_cmd_singularity_no_local_weights: Verifies launch command uses HF model identifier when local weights are missingtest_generate_model_launch_script_singularity_no_weights: Verifies batch mode correctly handles missing model weights--modelis specified in vllm_args