add conversation history summarization by blublinsky · Pull Request #2730 · openshift/lightspeed-service

blublinsky · 2026-02-03T13:32:04Z

Description

This PR is implemented as 2 commits

prepares for the implementation of conversation summarization
Actual summarization implementation
Summarization e2e test

What this implementation is missing:

Configuration for entries_to_keep - can be exposed in configuration. Is it important? now its 5
Asynchronous summarization - performance optimization, but can be quite complex

Type of change

Related Tickets & Documents

Related Issue #
OLS-2500
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

openshift-ci · 2026-02-03T13:34:05Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign xrajesh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

onmete

My understanding of how the summarization should work is:

Retrieve full conversation history (as we do today)
At prompt preparation time (in _prepare_prompt(), where limit_conversation_history() is called), check if history fits in available tokens
If it doesn't fit: summarize ALL messages via an LLM call, inject the summary into the system prompt
Store the summary in cache, replacing the original history
Next request: summary is retrieved as the "history" - it's small, always fits
This is a simple "summarize everything when needed" approach - no need to be clever about which messages to keep.

Essentially, what we are looking for is to replace this line https://github.com/openshift/lightspeed-service/blob/main/ols/src/query_helpers/docs_summarizer.py#L236, with summarization feature.

The PR's approach tries to optimize before fetching (limit what we retrieve), but with summarization, this optimization becomes unnecessary.
Once we summarize, the history is replaced with a compact summary. There's no scenario where we have "too many messages to fetch" because either:

history hasn't been summarized yet (small enough to fetch)
history was summarized (only summary exists)

Is my understanding reasonable? Is there a scenario that discards this? Can we try to not add more responsibilities to (already bloated) docs summarizer? :)

blublinsky · 2026-02-03T16:43:21Z

My understanding of how the summarization should work is:

Retrieve full conversation history (as we do today)
Not quite. We retrieve only partial, based on the token budget. When retrieving, we check whether the history is too large, which we will use as a signal for summarization.

At prompt preparation time (in _prepare_prompt(), where limit_conversation_history() is called), check if history fits in available tokens
It is checked immediately after retrieval

If it doesn't fit: summarize ALL messages via an LLM call, inject the summary into the system prompt
This is next step

Store the summary in cache, replacing the original history
next step

Next request: summary is retrieved as the "history" - it's small, always fits

This is a simple "summarize everything when needed" approach - no need to be clever about which messages to keep.

Essentially, what we are looking for is to replace this line https://github.com/openshift/lightspeed-service/blob/main/ols/src/query_helpers/docs_summarizer.py#L236, with summarization feature.

The PR's approach tries to optimize before fetching (limit what we retrieve), but with summarization, this optimization becomes unnecessary. Once we summarize, the history is replaced with a compact summary. There's no scenario where we have "too many messages to fetch" because either:
It actually is - its our defence mechanism

history hasn't been summarized yet (small enough to fetch)

history was summarized (only summary exists)

Is my understanding reasonable? Is there a scenario that discards this? Can we try to not add more responsibilities to (already bloated) docs summarizer? :)

summary.
What is done in this Pr:

Optimization of read and getting the signal that the history is too large, instead of checking its size every time. it is moved to the doc summarizer, because it is a place where we compute the available token budget.
The actual summarization is a simple async function that is trivial to implement based on this
Split these 2 to make PR smaller

blublinsky · 2026-02-03T20:34:39Z

/retest

onmete · 2026-02-04T13:50:44Z

ols/src/query_helpers/docs_summarizer.py

+Conversation history:
+{full_conversation}
+
+Summary:"""


Please is probably a waste of tokens :P

I found this prompt somewhere:

You are an expert conversation summarizer. Your job is to create detailed, comprehensive summaries of chat conversations. Your summary should include: - What were the main subjects covered? - Any agreements, choices, or conclusions made - Revealed preferences, likes, dislikes, or constraints - Significant Q&A exchanges - Tasks mentioned or to be completed Be comprehensive but concise. Focus on information that would be valuable for continuing the conversation later. Write in a natural, narrative style that another AI can easily understand and use as context. Do not include: - Pleasantries or greetings unless they reveal something important - Repetitive information

blublinsky · 2026-02-06T11:45:26Z

/retest

blublinsky · 2026-02-06T18:43:08Z

/retest

blublinsky · 2026-02-11T09:03:01Z

/retest

blublinsky · 2026-02-11T09:33:23Z

/retest

blublinsky · 2026-02-11T11:07:44Z

/retest

blublinsky · 2026-02-13T15:30:21Z

/override "ci/prow/ols-evaluation"
Granite issue

openshift-ci · 2026-02-13T15:30:40Z

@blublinsky: Overrode contexts on behalf of blublinsky: ci/prow/ols-evaluation

Details

In response to this:

/override "ci/prow/ols-evaluation"
Granite issue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci · 2026-02-13T15:30:42Z

@blublinsky: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot requested review from raptorsun and xrajesh February 3, 2026 13:33

blublinsky force-pushed the history-retrival branch 3 times, most recently from ceb314a to 9b9a793 Compare February 3, 2026 15:03

onmete reviewed Feb 3, 2026

View reviewed changes

blublinsky force-pushed the history-retrival branch from 9b9a793 to dad939c Compare February 3, 2026 16:23

blublinsky force-pushed the history-retrival branch from dad939c to e402813 Compare February 4, 2026 08:38

blublinsky changed the title ~~Refactor history retrieval in preparation to summarization~~ add conversation history summarization Feb 4, 2026

onmete reviewed Feb 4, 2026

View reviewed changes

blublinsky force-pushed the history-retrival branch from 5b5f910 to 9679bdc Compare February 4, 2026 14:23

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 4, 2026

blublinsky force-pushed the history-retrival branch from 9679bdc to b377810 Compare February 4, 2026 14:33

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 4, 2026

blublinsky force-pushed the history-retrival branch 4 times, most recently from 928ce14 to 760470a Compare February 6, 2026 09:15

blublinsky force-pushed the history-retrival branch 4 times, most recently from 36c48ac to 6dd787b Compare February 10, 2026 13:37

Refactor history retrieval in preparation to summarization

8a71cd1

blublinsky force-pushed the history-retrival branch from 6dd787b to 8a71cd1 Compare February 13, 2026 11:55

Conversation

blublinsky commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Related Tickets & Documents

Checklist before requesting a review

Testing

Uh oh!

openshift-ci bot commented Feb 3, 2026

Uh oh!

onmete left a comment

Choose a reason for hiding this comment

Uh oh!

blublinsky commented Feb 3, 2026

Uh oh!

blublinsky commented Feb 3, 2026

Uh oh!

onmete Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

blublinsky Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

blublinsky commented Feb 6, 2026

Uh oh!

blublinsky commented Feb 6, 2026

Uh oh!

blublinsky commented Feb 11, 2026

Uh oh!

blublinsky commented Feb 11, 2026

Uh oh!

blublinsky commented Feb 11, 2026

Uh oh!

blublinsky commented Feb 13, 2026

Uh oh!

openshift-ci bot commented Feb 13, 2026

Uh oh!

openshift-ci bot commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

blublinsky commented Feb 3, 2026 •

edited

Loading