Skip to content

vLLM Server Sync via LoRA Adapter Reload (avoid merge + full weight sync) for GRPO#5188

Open
lfranceschetti wants to merge 3 commits intohuggingface:mainfrom
lfranceschetti:vllm-sync-strategy-lora-adapter
Open

vLLM Server Sync via LoRA Adapter Reload (avoid merge + full weight sync) for GRPO#5188
lfranceschetti wants to merge 3 commits intohuggingface:mainfrom
lfranceschetti:vllm-sync-strategy-lora-adapter

Conversation

@lfranceschetti
Copy link

@lfranceschetti lfranceschetti commented Feb 26, 2026

What does this PR do?

Add a new vllm_sync_strategy="lora_adapter" option for TRL's vLLM server mode that, when using PEFT LoRA/QLoRA, updates the vLLM-side policy by saving and reloading LoRA adapters instead of merging adapters and syncing full model weights (the current behavior).

This is intended to make GRPO/online RL training with PEFT more reliable and efficient — especially for QLoRA-style training — while keeping vllm_sync_strategy="weights" as the default.

Fixes

Other Advantages

Decoupled from parameter naming — The "weights" strategy iterates state_dict() keys and must match each one to the vLLM-side parameter name, which is fragile across PEFT versions and model architectures (see the manual prefix-stripping in #2818). The "lora_adapter" strategy saves a standard PEFT checkpoint and lets vLLM load it through its own adapter path — the two sides never need to agree on internal parameter names.

Prior PRs and Why This PR is Different

This approach was originally proposed in #2730 but was superseded by #2818, which chose the merge-sync-unmerge approach because it supports all PEFT adapter types (DoRA, IA3, etc.), not just LoRA.

This PR addresses that objection by:

  • Keeping vllm_sync_strategy="weights" as the default — full backward compatibility, works with any adapter type.
  • Offering vllm_sync_strategy="lora_adapter" as an opt-in optimization for the most common case (standard LoRA/rsLoRA), raising a clear error for non-LoRA adapter types (IA3, Prefix Tuning, etc.) and guiding users to the vllm_sync_strategy="weights" default.

The original #2730 noted a potential vLLM memory leak from repeatedly loading adapters. This might still be an issue; however, only very small memory increases have been observed during experimental runs.

Future Improvements

vLLM's load_inplace feature (vllm-project/vllm#31326, merged Jan 2026) could further improve this idea in the future but is not required. TRL adaptation suggested in vllm-project/vllm#20149.

@lfranceschetti lfranceschetti changed the title Vllm sync strategy lora adapter vLLM Server Sync via LoRA Adapter Reload (avoid merge + full weight sync) Feb 26, 2026
@lfranceschetti lfranceschetti changed the title vLLM Server Sync via LoRA Adapter Reload (avoid merge + full weight sync) vLLM Server Sync via LoRA Adapter Reload (avoid merge + full weight sync) for GRPO Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[GRPO] bnb quantization + vllm GRPOTrainer fails to transfer weights to vLLM with _move_model_to_vllm after 7.5 hours of the job running

1 participant