Add an optional HF persistent cache configuration#55
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds optional persistent Hugging Face model caching support to prevent models from being re-downloaded on every script execution, particularly useful for manual testing scenarios. The implementation creates a Docker volume for cache persistence when configured and passes the cache directory to model execution scripts.
Key Changes:
- Added logic to check for and utilize
persistent_hf_cache_dirconfiguration from context - Implemented Docker volume creation and mounting for persistent HF cache storage
- Refactored model execution command building to append
--hf_cache_dirargument when persistent cache is enabled
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| docker_options += self.get_mount_arg(mount_datapaths) | ||
|
|
||
| if persistent_hf_cache_dir: | ||
| docker_options += f" -v {hf_models_cache_volume_name}:{persistent_hf_cache_dir} " |
There was a problem hiding this comment.
The persistent_hf_cache_dir value from context is used directly in the Docker mount without validation. This could lead to arbitrary directory mounting if the context contains malicious input. Add validation to ensure the path is safe and doesn't contain dangerous characters or paths like /, /etc, or similar sensitive directories.
| model_args = self.context.ctx.get("model_args", info["args"]) | ||
|
|
||
| if persistent_hf_cache_dir: | ||
| model_args += f" --hf_cache_dir {persistent_hf_cache_dir} " |
There was a problem hiding this comment.
The persistent_hf_cache_dir is concatenated directly into the command string without shell escaping. If the path contains spaces or special characters, it could break the command or introduce command injection vulnerabilities. Use proper shell escaping (e.g., shlex.quote()) when building the command string.
|
LGTM |
Motivation
This PR introduces support for persistent Hugging Face model caching within Docker containers. This prevents models from being re-downloaded every time the script runs, which is especially useful during manual testing on private machines from Conductor. By allowing cache reuse, it reduces network overhead and speeds up repeated test cycles.
Technical Details
persistent_hf_cache_dirin the context.hf_models_cache_volume) whenpersistent_hf_cache_diris provided.--hf_cache_dirargument when persistent cache is enabled.Should be merged after:
Test Plan
Validated with manual job: https://mlseqa-ci.amd.com:9090/view/Experimental/job/Test-MAD-akochin/118/
Test Result
Submission Checklist