Skip to content

Conversation

@agent-enemy-2
Copy link

Hey all. I'm brand new to this repo and saw this issue was sitting in the backlog for a while. Looks like it had some traction and a potential draft PR, but nothing has happened on it since mid last year. I figured I'd get my feet wet with it.

This change adds some token handling support for the /completion server endpoint. The idea is simple. When enabled (by providing n_token_healing_enabled parameter in API requests), the server will remove the last token from the input (calling it the healer token) that it sends to the model. It then compares the output tokens when sampling to ensure that the first output token has the same prefix as the healer token. This ensures that if there's any cutoff when doing auto complete that it won't interrupt the flow of the user.

Some example tests I've run:

Input:

curl -X POST http://localhost:8080/completion \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Five, Four, Thr",
    "n_token_healing_enabled": true,
    "n_token_healing_max_retries": 5,
    "n_predict": 5,
    "stop": ["\n", "."]
  }' 2>/dev/null | jq -r '.content'

Output:
ee, Two, One
Input:

curl -X POST http://localhost:8080/completion \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "My favorite rock band is Green Da",
    "n_token_healing_enabled": true,
    "n_token_healing_max_retries": 5,
    "n_predict": 5,
    "stop": ["\n", "."]
  }' 2>/dev/null | jq -r '.content'

Output:
y
Input:
curl -X POST http://localhost:8080/completion \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Green Days best album was Ameri",
    "n_token_healing_enabled": true,
    "n_token_healing_max_retries": 5,
    "n_predict": 5,
    "stop": ["\n", "."]
  }' 2>/dev/null | jq -r '.content'

Output:
can Idiot

A decision I had to make here was whether to throw this in the server logic itself or to create a new sampler function. I think due to the fact that only auto-complete tools would be using this, it makes sense to be inside the server logic, but do feel free to tell me this was not the right place to put it and any recommendations you might have with this or any other aspects of the code itself.

@agent-enemy-2 agent-enemy-2 changed the title Feat: Adding token handling auto complete Feat: Adding token healing support for auto complete Feb 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants