common : add common_speculative_is_compat() #19270

ggerganov · 2026-02-02T13:20:21Z

fix #19267

Memory modules that do not support removing the last tokens from the context (such as recurrent modules) cannot perform speculative decoding. Add new common_speculative_is_compat() to query this functionality and use it in llama-server to disable speculative decoding for those contexts.

ngxson

Just wondering if we can do this inside common_speculative_init instead.

For example, common_speculative_init can try to evaluate 2 tokens, then remove the first one. If llama_memory_seq_rm returns error, then we throw an error saying the model is not compatible.

Btw, I think it's better to throw an error and exit, rather than a warning.

coder543 · 2026-02-03T16:37:08Z

I just ran into #19267, and it would be cool if there were a way to make this compatible rather than just disabling it, but disabling it is better than crashing. With Qwen3-Coder-Next, ngram-mod could provide large speedups during coding workflows.

This reverts commit d30e59b.

ggerganov · 2026-02-04T11:22:49Z

@ngxson Implemented this idea in a new common_speculative_is_compat() helper function.

Btw, I think it's better to throw an error and exit, rather than a warning.

Do you have something specific in mind? In my server config, I want to set a default ngram-based spec decoding and have it applied for all routed models. When a routed model does not support it, it still continues to work. So I think a warning is better.

llama : add llama_memory_can_rm_suffix()

d30e59b

ggerganov requested a review from ngxson as a code owner February 2, 2026 13:20

github-actions bot added examples server labels Feb 2, 2026

danbev approved these changes Feb 2, 2026

View reviewed changes

loci-dev mentioned this pull request Feb 2, 2026

UPSTREAM PR #19270: llama : add llama_memory_can_rm_suffix() auroralabs-loci/llama.cpp#1136

Open

ngxson reviewed Feb 2, 2026

View reviewed changes

ggerganov added 2 commits February 4, 2026 13:11

Revert "llama : add llama_memory_can_rm_suffix()"

1f8d0c8

This reverts commit d30e59b.

spec : check if the target context is compatible for spec decoding

46c3bb1

ggerganov changed the title ~~llama : add llama_memory_can_rm_suffix()~~ common : add common_speculative_is_compat() Feb 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common : add common_speculative_is_compat() #19270

common : add common_speculative_is_compat() #19270

ggerganov commented Feb 2, 2026 •

edited

Loading

Uh oh!

ngxson left a comment

Uh oh!

coder543 commented Feb 3, 2026

Uh oh!

ggerganov commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

common : add common_speculative_is_compat() #19270

Are you sure you want to change the base?

common : add common_speculative_is_compat() #19270

Conversation

ggerganov commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

coder543 commented Feb 3, 2026

Uh oh!

ggerganov commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ggerganov commented Feb 2, 2026 •

edited

Loading