Skip to content

Conversation

@rohan-uiuc
Copy link
Contributor

@rohan-uiuc rohan-uiuc commented Nov 12, 2025

PR Type

Feature

Short Description

Implements support for on-the-fly model weight downloads from HuggingFace when local model weights directory doesn't exist. This allows users to launch models without manually downloading and mounting weight directories.

The code now checks if the model weights directory exists before attempting to bind mount it. If the directory doesn't exist, it skips the bind mount and uses the model identifier from --model in vllm_args (or falls back to model_name). Users must pass the full HuggingFace model identifier (e.g., Qwen/Qwen2.5-7B-Instruct) via --model in vllm_args for automatic downloads to work.

Fixes #166

Tests Added

  • test_generate_server_setup_singularity_no_weights: Verifies server setup doesn't include model weights path when directory doesn't exist
  • test_generate_launch_cmd_singularity_no_local_weights: Verifies launch command uses HF model identifier when local weights are missing
  • test_generate_model_launch_script_singularity_no_weights: Verifies batch mode correctly handles missing model weights
  • All existing tests pass (28 tests in test_slurm_script_generator.py, 116+ total tests)
  • Verified end-to-end: model downloads and serves successfully from HuggingFace when local weights don't exist and --model is specified in vllm_args

@codecov-commenter
Copy link

codecov-commenter commented Nov 12, 2025

Codecov Report

❌ Patch coverage is 93.54839% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.86%. Comparing base (d11de79) to head (2db9c6c).
⚠️ Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
vec_inf/client/_slurm_script_generator.py 91.30% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #167      +/-   ##
==========================================
+ Coverage   90.83%   90.86%   +0.03%     
==========================================
  Files          14       14              
  Lines        1342     1369      +27     
==========================================
+ Hits         1219     1244      +25     
- Misses        123      125       +2     
Files with missing lines Coverage Δ
vec_inf/client/_slurm_templates.py 100.00% <ø> (ø)
vec_inf/client/_utils.py 79.88% <100.00%> (+0.99%) ⬆️
vec_inf/client/_slurm_script_generator.py 96.81% <91.30%> (-1.02%) ⬇️

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@XkunW XkunW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rohan-uiuc, thanks for opening this, I left a few comments. Another thing worth considering is adding a check in the API to see if a model needs to be downloaded, and if that's the case, only allow the download if the HF cache directory env var is set, so that users wouldn't accidentally download a model to their home directory and use up all the quota

"""
launcher_script = ["\n"]

vllm_args_copy = self.params["vllm_args"].copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is necessary, as the model name should be parsed with launch command not part of --vllm-args

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, model_name is the short name used for config lookup, log directories, and job naming (e.g., llama-3). The --model in vllm_args would allow users to specify the full HF path when downloading from HuggingFace.

I'm open to alternative approaches if you have a preference, like:
Dedicated CLI option (e.g., --hf-model) - keeps model_name as the short identifier, adds explicit option for full HF path
Reuse existing model_name - allow full HF paths directly, but adjust config lookups, log directory structure, etc. to handle paths with /

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for on-the-fly model downloads to llm-inference package Support downloading model weights on the fly from HF

3 participants