Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/retest |
onmete
left a comment
There was a problem hiding this comment.
A couple of high-level items.
| top_k: Number of tools to retrieve | ||
| threshold: Minimum similarity threshold for filtering results | ||
| """ | ||
| self.embed_model = embed_model or "sentence-transformers/all-mpnet-base-v2" |
There was a problem hiding this comment.
This shouldn't be configurable for now. It needs to use the hardcoded value of embedding that we have baked in the image.
There was a problem hiding this comment.
Why not. We already have this config integrated with the operator
There was a problem hiding this comment.
Because it would fail in disconnected environments. Why do we need to support the embedding model switch?
ols/utils/mcp_utils.py
Outdated
| user_token, | ||
| client_headers, | ||
| allowed_tool_names=filtered_tool_names, | ||
| ) |
There was a problem hiding this comment.
After this method identifies which tools are relevant, it opens new MCP connections to re-fetch those same tools (_gather_and_populate_tools with allowed_tool_names). Why not cache the StructuredTool objects from the initial population and return those directly?
There was a problem hiding this comment.
Good question.
The issue is that tools have authentication baked in. We need to refetch them with a new authentication to make calls correct
ols/utils/config.py
Outdated
| Without this lock, one user's client tools could leak into another user's | ||
| filtered results. | ||
| """ | ||
| return asyncio.Lock() |
There was a problem hiding this comment.
So the lock is here basically because of the requests with client headers, right?
Client headers are an edge case - a minority of requests, for a minority of server configurations. And the current OLS (UI) is not supporting that.
I really don't like that it affects the architecture of implementation this much.
See this scenario:
- Index contains:
get_pods(k8s-auth) +github_create_pr(client-auth) - Query: "create a pull request"
- ToolsRAG returns:
github_create_pr(most relevant) - Client didn't send GitHub headers this time
- Tool gets dropped -> empty result
The user gets nothing, even thoughget_podswas in the index and accessible. The most relevant tool won the ranking but was inaccessible, and it pushed out the tools that were actually available.
I think we need static index for tools that are ALWAYS accessible and ANOTHER one for client tools which will be added based on the request headers (and then combine/filter results).
Then both indexes are static, hence no need for lock.
There was a problem hiding this comment.
The lock is only for mcp_headers. If they are not present, there is no lock. So if you consider mcp_headers an edge case, so is the lock
| self.alpha = alpha | ||
| self.top_k = top_k | ||
| self.threshold = threshold | ||
| self.embedding_model = SentenceTransformer(embed_model) |
There was a problem hiding this comment.
We are creating a second instance of the same embedding model we have for RAG. We need to reuse that or at least confirm the library itself is handling that, to not load another ~400MB into memory.
There was a problem hiding this comment.
fixed in config
89e5515 to
b1b66d8
Compare
|
/retest |
f7d057c to
70904be
Compare
|
/retest |
1 similar comment
|
/retest |
805285d to
8aa9c0a
Compare
|
/retest |
5e4f971 to
5b0818a
Compare
|
/retest |
9e3954b to
14a9bac
Compare
|
/override "ci/prow/e2e-ols-cluster" |
|
@blublinsky: Overrode contexts on behalf of blublinsky: ci/prow/e2e-ols-cluster DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
14a9bac to
64a169a
Compare
|
@blublinsky: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Description
Type of change
Related Tickets & Documents
OLS-2275
OLS-2275
Checklist before requesting a review
Testing