spec: ngram-mod, score-based pruning #19294
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
related to #19164, PoC of #19164 (comment)
Track for each ngram in the pool a capped score, initially set to SCORE_INS on insert. If an ngram was used successfully in a draft, count its score up. If the draft was rejected count its score down. On streaks remove all ngrams with a score lower than SCORE_THR.
I did superficial testing and speedup is more consistent throughout processing the whole request; no more sudden drops of speed-up after (early) low acceptance streaks. Pruning currently goes through all 4M cache pool entries + the scoring has a minor but noticeable effect on performance; there is still optimization potential.
Also added some hash pool stats (scoring state + collisions) that might be helpful to further fine-tune the parameters (SCORE_MIN, SCORE_MAX, SCORE_INS, ..).
Here logs where the prompt looked like this: [GIVEN_SOURCE_CODE|TASK] and the model was tasked to generate [A|GIVEN_SOURCE_CODE|B|GIVEN_SOURCE_CODE]. A, and B is something sampled stochastically, GIVEN_SOURCE_CODE is known to be in the hash pool. Before the change sometimes the streak was encountered (early), the entire hash pool was cleared and there was no speed-up afterwards (see: #19164 (comment)). With this change we only prune low-scored ngrams on streaks and (still useful) ngrams above or equal SCORE_THR remain in the hash pool.
Log