fix: only reset LoRa configs when they have changed from previous batch #19280

agent-enemy-2 · 2026-02-03T01:56:27Z

Overview

Fix for #19217

Currently we are setting the LoRa config for each and every token request, even if the configuration between batches has not changed. It appears in this PR, in llama-context.cpp, line 1078 we added schedule reserving for when we set new LoRa configs, and so now when we have a LoRa config, for every batch we are decoding, we are reserving the scheduler.

This change adds a field to the server slot that holds the last batche's LoRa config. For each batch, we run a check to see if the config is the same as the previous batch, and only if it is not do we go ahead and set it. Otherwise, proceed.

Testing

I was able to re-produce the issue relatively easily with some debug logs

./llama-server -hf bartowski/Meta-Llama-3.1-8B-Instruct-GGUF:Q8_0 --lora ~/Downloads/LoRA-Llama-3.1-8B-MultiReflection-f16.gguf -c 1024 --parallel 1 --host 0.0.0.0 --port 8080 --verbose

and was able to see this endlessly along with huge memory spikes:

Setting sched_need_reserve to TRUE in the set_adapter_lora functionset_embeddings: value = 0
Reserving memory during decoding

sched_reserve: reserving ...
sched_reserve: max_nodes = 3692
srv    operator(): http: streamed chunk: data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":" requests"}}],"created":1770077607,"id":"chatcmpl-1gqsKPcqCDHRRMZFNqfWqnl557ARXVc9","model":"bartowski/Meta-Llama-3.1-8B-Instruct-GGUF:Q8_0","system_fingerprint":"b7916-0dfcd3b60","object":"chat.completion.chunk","timings":{"cache_n":0,"prompt_n":109,"prompt_ms":628.118,"prompt_per_token_ms":5.762550458715597,"prompt_per_second":173.5342722227352,"predicted_n":2,"predicted_ms":180.088,"predicted_per_token_ms":90.044,"predicted_per_second":11.105681666740704}}


sched_reserve: reserving full memory module
sched_reserve: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
graph_reserve: reserving a graph for ubatch with n_tokens =    1, n_seqs =  1, n_outputs =    1
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
sched_reserve:       MTL0 compute buffer size =   509.12 MiB
sched_reserve:        CPU compute buffer size =    18.01 MiB
sched_reserve: graph nodes  = 1903
sched_reserve: graph splits = 2

After these changes, the issue no longer appears and memory remains stable.

ggerganov

Better to move the check inside llama_context::set_adapter_lora() and not set the sched_need_reserve flag if the same adapters are passed.

fix: updating LoRa config check to the common.cpp Removing dangling unused import

agent-enemy-2 · 2026-02-04T05:38:07Z

Had to implement the check in the common_set_adapter_lora because in the method we were clearing the loras every time, and then setting sched_need_reserve flag as well.

Added some helpers to validate if the lora configs are the same, and if they are, skip clearing and skip reserving the scheduler.

ggerganov · 2026-02-04T10:43:28Z

Had to implement the check in the common_set_adapter_lora because in the method we were clearing the loras every time, and then setting sched_need_reserve flag as well.

Can't we update the logic in llama_set_adapter_lora() to first check if they are the same and if they are - not do anything. We already have such logic in other functions, such as:

llama.cpp/src/llama-context.cpp

Lines 991 to 1001 in 6ab881b

    
           void llama_context::set_causal_attn(bool value) { 
        
               LLAMA_LOG_DEBUG("%s: value = %d\n", __func__, value); 
        
               if (cparams.causal_attn == value) { 
        
                   return; 
        
               } 
        
               cparams.causal_attn = value; 
        
               sched_need_reserve = true; 
        
           }

agent-enemy-2 requested review from ggerganov and ngxson as code owners February 3, 2026 01:56

loci-dev mentioned this pull request Feb 3, 2026

UPSTREAM PR #19280: fix: only reset LoRa configs when they have changed from previous batch auroralabs-loci/llama.cpp#1142

Open

github-actions bot added examples server labels Feb 3, 2026

ggerganov reviewed Feb 3, 2026

View reviewed changes

agent-enemy-2 requested review from 0cc4m, CISC, JohannesGaessler, lhez and max-krasnyansky as code owners February 4, 2026 05:16

agent-enemy-2 force-pushed the bug-fix/memory-leak branch 2 times, most recently from e810f4d to 8edc85d Compare February 4, 2026 05:34

fix: only reset LoRa configs when they have changed from previous batch

b131ad4

fix: updating LoRa config check to the common.cpp Removing dangling unused import

agent-enemy-2 force-pushed the bug-fix/memory-leak branch from 8edc85d to b131ad4 Compare February 4, 2026 05:36

CISC removed request for 0cc4m, CISC, JohannesGaessler, lhez and max-krasnyansky February 4, 2026 08:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: only reset LoRa configs when they have changed from previous batch #19280

fix: only reset LoRa configs when they have changed from previous batch #19280

Uh oh!

agent-enemy-2 commented Feb 3, 2026 •

edited

Loading

Uh oh!

ggerganov left a comment

Uh oh!

agent-enemy-2 commented Feb 4, 2026 •

edited

Loading

Uh oh!

ggerganov commented Feb 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: only reset LoRa configs when they have changed from previous batch #19280

Are you sure you want to change the base?

fix: only reset LoRa configs when they have changed from previous batch #19280

Uh oh!

Conversation

agent-enemy-2 commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Testing

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

agent-enemy-2 commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

agent-enemy-2 commented Feb 3, 2026 •

edited

Loading

agent-enemy-2 commented Feb 4, 2026 •

edited

Loading

ggerganov commented Feb 4, 2026 •

edited

Loading