vLLM Server Sync via LoRA Adapter Reload (avoid merge + full weight sync) for GRPO by lfranceschetti · Pull Request #5188 · huggingface/trl

lfranceschetti · 2026-02-26T14:29:45Z

What does this PR do?

Add a new vllm_sync_strategy="lora_adapter" option for TRL's vLLM server mode that, when using PEFT LoRA/QLoRA, updates the vLLM-side policy by saving and reloading LoRA adapters instead of merging adapters and syncing full model weights (the current behavior).

This is intended to make GRPO/online RL training with PEFT more reliable and efficient — especially for QLoRA-style training — while keeping vllm_sync_strategy="weights" as the default.

Fixes

Sync is slow and memory intensive — The merge-sync-unmerge loop iterates over all named parameters and sequentially updates each one on the vLLM server. For Zero3 this can lead to OOMs (also from own experience). With vllm_sync_strategy="lora_adapter", only the small LoRA checkpoint is written and loaded.
Relates to GRPOTrainer: Parallelism for updating named params #3557 (Fixes it if LoRA is used)
QLoRA merge causes quantization rounding errors — Merging LoRA weights into a 4-bit quantized base model and then unmerging is lossy. The vllm_sync_strategy="lora_adapter" approach avoids merging entirely;
Fixes [GRPO] bnb quantization + vllm #3466 - However support of 4-bit models on the vllm server still need to be implemented (working on it)
NCCL weight transfer is brittle over long runs — Current approach sometimes fails after hours of training due to NCCL communication errors. The vllm_sync_strategy="lora_adapter" approach replaces NCCL with a simple file write + HTTP reload request, which is more robust.
Fixes GRPOTrainer fails to transfer weights to vLLM with _move_model_to_vllm after 7.5 hours of the job running #2840 (if LoRA is used)

Other Advantages

Decoupled from parameter naming — The "weights" strategy iterates state_dict() keys and must match each one to the vLLM-side parameter name, which is fragile across PEFT versions and model architectures (see the manual prefix-stripping in #2818). The "lora_adapter" strategy saves a standard PEFT checkpoint and lets vLLM load it through its own adapter path — the two sides never need to agree on internal parameter names.

Prior PRs and Why This PR is Different

This approach was originally proposed in #2730 but was superseded by #2818, which chose the merge-sync-unmerge approach because it supports all PEFT adapter types (DoRA, IA3, etc.), not just LoRA.

This PR addresses that objection by:

Keeping vllm_sync_strategy="weights" as the default — full backward compatibility, works with any adapter type.
Offering vllm_sync_strategy="lora_adapter" as an opt-in optimization for the most common case (standard LoRA/rsLoRA), raising a clear error for non-LoRA adapter types (IA3, Prefix Tuning, etc.) and guiding users to the vllm_sync_strategy="weights" default.

The original #2730 noted a potential vLLM memory leak from repeatedly loading adapters. This might still be an issue; however, only very small memory increases have been observed during experimental runs.

Future Improvements

vLLM's load_inplace feature (vllm-project/vllm#31326, merged Jan 2026) could further improve this idea in the future but is not required. TRL adaptation suggested in vllm-project/vllm#20149.

lfranceschetti added 3 commits February 24, 2026 14:31

Add vllm-sync-strategy config

742f71c

Introduce vllm lora adapter saving and loading

80a27c7

Minor cleanup

9e33284

lfranceschetti changed the title ~~Vllm sync strategy lora adapter~~ vLLM Server Sync via LoRA Adapter Reload (avoid merge + full weight sync) Feb 26, 2026

lfranceschetti changed the title ~~vLLM Server Sync via LoRA Adapter Reload (avoid merge + full weight sync)~~ vLLM Server Sync via LoRA Adapter Reload (avoid merge + full weight sync) for GRPO Feb 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM Server Sync via LoRA Adapter Reload (avoid merge + full weight sync) for GRPO#5188

vLLM Server Sync via LoRA Adapter Reload (avoid merge + full weight sync) for GRPO#5188
lfranceschetti wants to merge 3 commits intohuggingface:mainfrom
lfranceschetti:vllm-sync-strategy-lora-adapter

lfranceschetti commented Feb 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lfranceschetti commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Fixes

Other Advantages

Prior PRs and Why This PR is Different

Future Improvements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lfranceschetti commented Feb 26, 2026 •

edited

Loading