Adjusting MCP header processing to llama stack 0.4.x and adding MCP e2e tests#1141
Adjusting MCP header processing to llama stack 0.4.x and adding MCP e2e tests#1141blublinsky wants to merge 1 commit intolightspeed-core:mainfrom
Conversation
WalkthroughAdds JSON-RPC MCP endpoints (initialize, tools/list, tools/call) to the mock server, changes MCP header/authorization handling and logging across the app, replaces runtime MCP registration with config-driven logging, expands MCP test configs and E2E tests, and updates docker-compose ports/entrypoints and unit tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant LightspeedStack
participant MCPMockServer
participant LlamaStack
Client->>LightspeedStack: Send query (may include MCP-HEADERS)
LightspeedStack->>LightspeedStack: prepare_tools(): resolve per-server headers, build MCP tool defs (attach authorization)
LightspeedStack->>MCPMockServer: POST JSON-RPC "tools/list"
MCPMockServer-->>LightspeedStack: return tools array
LightspeedStack->>LlamaStack: Call Responses API with MCP tools
LlamaStack->>MCPMockServer: POST JSON-RPC "tools/call" (when tool invoked)
MCPMockServer-->>LlamaStack: tool result or error
LlamaStack-->>LightspeedStack: response containing tool_results
LightspeedStack-->>Client: final aggregated response (streaming or non-streaming)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@tests/e2e/features/steps/mcp.py`:
- Around line 860-874: The step implementation for
check_streaming_response_has_tool_results only asserts status_code == 200 and
duplicates check_streaming_response_successful; update the step to either
implement real SSE parsing of context.response to assert presence of tool
execution events (parse Server-Sent Events, inspect event types/payloads for
tool results) or at minimum add a clear TODO/FIXME comment and change the
assertion to fail if SSE is expected but not parsed (e.g., mark test as xfail or
raise NotImplementedError), referencing
check_streaming_response_has_tool_results and
check_streaming_response_successful so reviewers can find and reconcile the
duplicate behavior.
- Around line 775-806: The streaming request in
send_streaming_query_with_mcp_tools sets stream=True but never consumes or
closes the response, leaking connections; fix by ensuring context.response is
closed when you don't need the stream (either call response.close() before
returning from send_streaming_query_with_mcp_tools or change to stream=False if
streaming isn't required), or add a global cleanup (e.g., after_scenario hook
that grabs getattr(context, "response", None) and calls close()) to guarantee
closure; apply the same fix to the other streaming helper referenced around
lines 809-843 so no requests.Response objects remain unclosed.
- Around line 581-641: Four step functions (check_response_has_tool_results,
check_successful_tool_execution, check_response_has_tool_info,
check_tool_results_in_response) all perform the same minimal assertions, which
masks real tool-verification; either consolidate them into one reusable step or
strengthen each with tool-specific checks (e.g., assert presence of
"tool_calls", "tool_results", or expected keys inside response["response"], or
validate a tool call entry structure/content). Update the functions to call a
shared helper (e.g., assert_tool_response_contains(response_data,
required_keys)) or replace duplicate bodies with distinct assertions that check
tool-specific fields or content substrings so the steps actually validate tool
execution and call metadata.
- Around line 530-557: The test step function check_tools_list_calls currently
issues its own tools/list JSON-RPC request (payload) to prove the mock is alive;
change it to inspect the MCP mock server's recorded request/log endpoint (use
mock_server_url + the mock's requests or debug API) and assert that at least one
recorded request has method == "tools/list"; specifically replace the
requests.post(payload) call with a request to the mock server's
recorded-requests API, parse the returned request entries for entry["method"] or
entry["jsonrpc"]["method"] == "tools/list", and raise AssertionError if none
found while keeping the same timeout and RequestException handling.
🧹 Nitpick comments (16)
docker-compose-library.yaml (1)
69-73: Hardcoded test secret in entrypoint — acceptable for test-only compose files, but verify it won't leak into production images.The token
test-secret-token-123is written at container startup. This is fine for a test-only docker-compose file, but ensure this compose file is never used in production or CI pipelines that deploy to shared environments.One minor note: if the Python process crashes on startup, the exit code is swallowed by
bash -c. Consider addingset -eorexecbefore the Python command so the container properly reports failure.Suggested improvement
entrypoint: > /bin/bash -c " printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token && - /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py + exec /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py "dev-tools/mcp-mock-server/server.py (4)
63-67: Excessive pylint suppressions suggest the method needs decomposition.
do_POSThas four pylint disables (invalid-name,too-many-locals,too-many-branches,too-many-statements). For a test mock this is acceptable, but consider extracting the method-dispatch logic into helper methods (e.g.,_handle_initialize,_handle_tools_list,_handle_tools_call) to reduce complexity.
84-86: Fragile mutation ofrequest_log[-1]— relies on ordering with_capture_headers.This appends
rpc_methodto the last entry inrequest_log, which was just added by_capture_headers()on line 69. If_capture_headersever changes to not append (e.g., due to an exception or refactor), this would raise anIndexErroror annotate the wrong entry. For a test mock, this is low risk.
97-113:match Truepattern is unconventional — consider usingif/elifinstead.The
match True: case _ if ...pattern works but reads oddly compared to a straightforwardif/elifchain, which would be more idiomatic for this type of conditional dispatch.
235-239:/debug/clearendpoint uses GET for a mutating operation.Using GET for state mutation violates HTTP semantics. For a test-only mock server this is fine, but a POST or DELETE would be more conventional. No action needed unless this mock is reused more broadly.
tests/e2e/configuration/server-mode/lightspeed-stack.yaml (1)
21-25: Newconversation_cacheblock added — appears unrelated to MCP changes.This sqlite conversation cache addition seems orthogonal to the MCP header processing changes in this PR. If it's needed for the MCP tests specifically, a comment explaining why would be helpful. Otherwise, consider splitting it into a separate commit/PR for cleaner change tracking.
src/utils/responses.py (2)
139-139: Increasing pylint complexity suppressions — the function is growing.
prepare_toolsnow suppressestoo-many-localsandtoo-many-branches. Most of the added lines are debug logging. This is acceptable for now, but if more logic is added in the future, consider extracting the logging/summary block (lines 189-235) into a helper.
453-485: Header resolution skip logic is correct but the logging reconstruction is complex.The resolved/missing header computation for the warning message (lines 458-471) handles case-insensitive Authorization matching well, but the logic is intricate for what's essentially a log message. Consider simplifying by tracking resolved header names as you go (in the resolution loop above) rather than reconstructing them after the fact.
docker-compose.yaml (1)
87-91: Hardcoded Python version in entrypoint path.The path
/app-root/.venv/bin/python3.12is brittle — if the base image upgrades Python, this will break silently. Consider using/app-root/.venv/bin/pythonor/app-root/.venv/bin/python3instead.Suggested change
entrypoint: > /bin/bash -c " printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token && - /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py + /app-root/.venv/bin/python3 /app-root/src/lightspeed_stack.py "src/app/main.py (2)
23-24: Remove commented-out import.This commented-out import is dead code. If the Responses API approach is confirmed, remove it entirely rather than leaving breadcrumbs.
Suggested change
- -# from utils.common import register_mcp_servers_async # Not needed for Responses API
94-98: Remove commented-out registration code.Lines 94-98 contain commented-out code that duplicates the intent already expressed by the NOTE comment on line 94-95. The old function calls on lines 96-98 should be removed.
Suggested change
# NOTE: MCP server registration not needed for Responses API # The Responses API takes inline tool definitions instead of pre-registered toolgroups - # logger.info("Registering MCP servers") - # await register_mcp_servers_async(logger, configuration.configuration) - # get_logger("app.endpoints.handlers")tests/e2e/configuration/library-mode/lightspeed-stack.yaml (1)
20-24: Conversation cache addition appears unrelated to MCP changes.This
conversation_cacheblock seems orthogonal to the MCP work. If intentional, consider noting it in the PR description.tests/e2e/features/mcp_tools.feature (1)
58-64: Consider verifying the negative test more explicitly.The "skipped when MCP-HEADERS is missing" scenario (line 58) validates via tool log contents (exactly
mock_tool_file, mock_tool_k8s), which implicitly asserts thatmock_tool_clientis absent. This is acceptable, but adding an explicit "should NOT contain tool mock_tool_client" step would make the intent clearer to future readers.tests/e2e/features/steps/mcp.py (3)
51-84: Extract a shared helper to reduce query-sending duplication.Five functions (
send_query_with_mcp_tools,send_query_about_tools,send_query_requesting_tool_usage,send_query_triggering_tool_usage,send_query_requiring_multiple_tools) follow the same pattern: initializeauth_headers, build URL, POST payload, handle exception. They differ only in query text and timeout.Also note that
send_query_with_mcp_tools(Line 64) omitsmodel/providerfrom the payload while the other four include them — verify this is intentional.♻️ Suggested helper
def _send_query( context: Context, query: str, *, timeout: int = 30, include_model: bool = True, stream: bool = False, ) -> None: """Common helper for sending query requests.""" if not hasattr(context, "auth_headers"): context.auth_headers = {} base_url = f"http://{context.hostname}:{context.port}" endpoint = "streaming_query" if stream else "query" url = f"{base_url}/v1/{endpoint}" payload: dict = {"query": query} if include_model: payload["model"] = getattr(context, "default_model", "gpt-4o-mini") payload["provider"] = getattr(context, "default_provider", "openai") try: context.response = requests.post( url, json=payload, headers=context.auth_headers, timeout=timeout, stream=stream, ) except requests.RequestException as e: print(f"❌ Query request failed: {e}") context.response = NoneEach step then becomes a one-liner, e.g.:
`@when`("I send a query that uses MCP tools") def send_query_with_mcp_tools(context: Context) -> None: _send_query(context, "What tools are available?", include_model=False)Also applies to: 382-479, 493-527
98-150: Extract a helper for fetching the mock server request log.The pattern of
GET /debug/requests→ assert 200 → parse JSON → handleRequestExceptionis duplicated across ~10 functions (check_mcp_server_received_requests,check_mcp_server_request_count,check_file_auth_header,check_k8s_auth_header,check_client_auth_header,check_mcp_tool_in_log,check_mcp_tool_not_in_log,check_mcp_exact_tools_in_log,check_tools_call_method,check_multiple_tool_calls).♻️ Suggested helper
def _get_mock_request_log() -> list: """Fetch the request log from the MCP mock server's debug endpoint.""" try: response = requests.get(f"{MOCK_MCP_SERVER_URL}/debug/requests", timeout=15) assert response.status_code == 200, ( f"Failed to get debug requests: {response.status_code}" ) log = response.json() assert isinstance(log, list), f"Expected list, got {type(log)}" return log except requests.RequestException as e: raise AssertionError(f"Could not connect to MCP mock server: {e}") from eEach step then just calls
_get_mock_request_log()and runs its specific assertion, e.g.:`@then`("The MCP mock server should have received requests") def check_mcp_server_received_requests(context: Context) -> None: log = _get_mock_request_log() assert len(log) > 0, "MCP mock server received no requests"
276-282: Preferany()over manual loop-and-flag.♻️ Suggested simplification
- found = False - for req in requests_log: - if req.get("tool_name") == tool_name: - found = True - break - - assert found, f"Tool '{tool_name}' not found in mock server request log" + assert any( + req.get("tool_name") == tool_name for req in requests_log + ), f"Tool '{tool_name}' not found in mock server request log"
89e3ca0 to
65219bf
Compare
src/app/endpoints/streaming_query.py
Outdated
| mcp_server_endpoints = [ | ||
| (t.get("server_label"), t.get("server_url")) | ||
| for t in responses_params.tools | ||
| if t.get("type") == "mcp" |
There was a problem hiding this comment.
it is repeated few lines above, the filtering can be performed once
src/app/endpoints/streaming_query.py
Outdated
| ) | ||
| # Retrieve response stream (may raise exceptions) | ||
| # Log request details before calling Llama Stack (MCP debugging) | ||
| if responses_params.tools: |
There was a problem hiding this comment.
condition should be explicit, what exactly does it mean written like this
src/app/endpoints/streaming_query.py
Outdated
| for t in responses_params.tools | ||
| if t.get("type") == "mcp" | ||
| ] | ||
| if mcp_server_endpoints: |
There was a problem hiding this comment.
again the condition should be explicit
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In `@tests/e2e/configuration/library-mode/lightspeed-stack-mcp.yaml`:
- Around line 49-50: Add a POSIX newline at EOF: open the YAML file and ensure
the final line (the mapping under authorization_headers, e.g. X-API-Token:
"client") is followed by a single trailing newline character so the file ends
with a newline.
In `@tests/e2e/features/environment.py`:
- Around line 179-182: The cleanup currently deletes context.auth_headers but
misses clearing scenario-specific context.expect_tool_errors (set by
configure_mock_server_errors) which affects timeout logic in
send_query_with_mcp_tools; update the teardown to also check for and delete
context.expect_tool_errors (similar to auth_headers) so that per-scenario state
does not leak between scenarios.
In `@tests/e2e/features/mcp_tools.feature`:
- Around line 48-56: The scenario "MCP mock server receives client-provided
token via MCP-HEADERS" promises verification of the client token but only
asserts tools called; add an assertion step that checks the recorded request's
Authorization header equals the expected token (e.g.
"my-k8s-token"/"my-client-token") after the existing tool assertions. Locate the
scenario and add a step similar to the other scenarios' header checks (see the
file-auth and k8s-auth scenarios) so the MCP mock server request log is queried
for the Authorization header from the client-auth request and asserted to
contain the expected token value.
In `@tests/e2e/features/steps/mcp.py`:
- Around line 555-573: The test check_tools_call_method currently only asserts
that requests_log has entries but doesn't verify any are for the "tools/call"
RPC; update check_tools_call_method to filter response.json() for entries where
entry.get("rpc_method") == "tools/call" (same approach as
check_tools_list_calls), then assert that the filtered list length is > 0 and
raise a clear assertion if none found; keep existing timeout/exception handling
and the initial GET to f"{mock_server_url}/debug/requests" but replace the
generic len(requests_log) check with the filtered check for "tools/call".
- Around line 51-84: The step function send_query_with_mcp_tools builds a
payload missing the "model" and "provider" keys causing inconsistent behavior
versus other send_* steps; update the payload in send_query_with_mcp_tools to
include the same "model" and "provider" fields used by the other helpers (see
send_query_about_tools, send_query_requesting_tool_usage,
send_query_triggering_tool_usage, send_query_requiring_multiple_tools) so the
request uses explicit model/provider values rather than defaults, preserving the
same keys and structure those functions send.
- Around line 697-718: The test check_multiple_tool_calls currently asserts on
total requests_log length which can be inflated; change it to filter the debug
requests for only tool-call entries (e.g., where each entry's method or path
indicates "tools/call") and assert the filtered list has length >= 3; update
variable names (e.g., tool_calls = [r for r in requests_log if r.get("method")==
"tools/call" or "tools/call" in r.get("path","")]) and use len(tool_calls) in
the assertion and the printed message so the test verifies actual tool call
requests.
- Around line 747-778: The test sends a request to /v1/streaming_query without
specifying media_type so the endpoint emits SSE (text/plain) and later
response.json() will fail; in send_streaming_query_with_mcp_tools update the
JSON payload to include "media_type": "application/json" (or alternatively
adjust the test assertions to parse SSE events instead of calling
response.json()); apply the same change to the other streaming test payloads in
this file that call /v1/streaming_query so they either include media_type or
handle SSE parsing in their assertion logic.
🧹 Nitpick comments (6)
dev-tools/mcp-mock-server/server.py (1)
97-113: Consider usingif/elifinstead ofmatch True:for clarity.The
match True: case _ if "..." in auth_header:pattern is functionally correct but unconventional. A standardif/elifchain is more idiomatic and readable for simple string-contains checks. This is a test tool though, so it's a minor style nit.tests/e2e/features/mcp_tools.feature (1)
119-130: Good CI timing documentation on slow scenarios.The comments about CI timing variability (Lines 119–120 and 142–143) are helpful for maintainers. Consider also tagging these scenarios with something like
@slowso CI pipelines can optionally skip them in fast-feedback runs.tests/e2e/features/steps/mcp.py (4)
10-11: Consider making the mock server URL configurable via environment variable.
MOCK_MCP_SERVER_URLis hardcoded tolocalhost:9000. Other test infrastructure in this repo (e.g.,E2E_LSC_HOSTNAME,E2E_LSC_PORT) uses environment variables, which allows flexibility when running tests against different environments.Suggested change
-MOCK_MCP_SERVER_URL = "http://localhost:9000" +MOCK_MCP_SERVER_URL = os.getenv("MCP_MOCK_SERVER_URL", "http://localhost:9000")(Would also need
import osat the top.)
321-339: Step name says "exactly" but implementation allows a subset of listed tools.The Gherkin step text
The MCP mock server request log should contain exactly tools {tool_list}implies all listed tools must be present. The actual implementation (Lines 355–363) only requires at least one expected tool to have been called, and it does not assert that all expected tools were called. This is intentionally flexible per the docstring, but the word "exactly" in the step name is misleading.Consider renaming to something like
should contain only tools from {tool_list}to better reflect the "at-least-one from this set, nothing outside it" semantics.
721-744: Weak assertions: steps only verify HTTP 200 / key existence.
check_all_tool_calls_succeeded(Lines 721–730) only assertsstatus_code == 200, andcheck_response_has_all_results(Lines 733–744) only asserts theresponsekey exists. Neither verifies anything tool-specific. These pass for any successful query, including one that uses no tools.If stronger assertions aren't feasible yet (due to LLM response non-determinism), consider adding a
# TODOnoting the gap so it's tracked.
153-253: Consider extracting a shared helper for/debug/requestsfetching.
check_file_auth_header,check_k8s_auth_header, andcheck_client_auth_header(and many other steps) all repeat the same pattern: GET/debug/requestswith 15s timeout, assert 200, parse JSON, then filter. A small helper would reduce duplication:def _get_mock_request_log() -> list: """Fetch the MCP mock server request log.""" response = requests.get(f"{MOCK_MCP_SERVER_URL}/debug/requests", timeout=15) assert response.status_code == 200, f"Failed to get debug requests: {response.status_code}" return response.json()This would simplify each step to a single call + domain-specific filtering.
| authorization_headers: | ||
| X-API-Token: "client" No newline at end of file |
There was a problem hiding this comment.
Missing trailing newline at end of file.
Most POSIX tools and linters expect files to end with a newline character.
Proposed fix
authorization_headers:
X-API-Token: "client"
+📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| authorization_headers: | |
| X-API-Token: "client" | |
| authorization_headers: | |
| X-API-Token: "client" | |
🤖 Prompt for AI Agents
In `@tests/e2e/configuration/library-mode/lightspeed-stack-mcp.yaml` around lines
49 - 50, Add a POSIX newline at EOF: open the YAML file and ensure the final
line (the mapping under authorization_headers, e.g. X-API-Token: "client") is
followed by a single trailing newline character so the file ends with a newline.
tests/e2e/features/environment.py
Outdated
| # Clear auth headers to prevent leakage between scenarios | ||
| # Context attributes persist across scenarios unless explicitly cleared | ||
| if hasattr(context, "auth_headers"): | ||
| del context.auth_headers |
There was a problem hiding this comment.
Good: auth_headers cleanup prevents cross-scenario leakage.
This correctly addresses the Behave Context persistence issue — custom attributes persist across scenarios unless explicitly cleared. Based on learnings, "In the Behave Python testing framework, the Context object is created once for the entire test run and reused across all features and scenarios. Custom attributes added to the context persist until explicitly cleared."
However, context.expect_tool_errors (set in mcp.py:configure_mock_server_errors, Line 640) is also scenario-specific state that isn't cleared here. It affects timeout selection in send_query_with_mcp_tools (Line 70–74). Consider clearing it alongside auth_headers.
Proposed fix
# Clear auth headers to prevent leakage between scenarios
# Context attributes persist across scenarios unless explicitly cleared
if hasattr(context, "auth_headers"):
del context.auth_headers
+ if hasattr(context, "expect_tool_errors"):
+ del context.expect_tool_errors🤖 Prompt for AI Agents
In `@tests/e2e/features/environment.py` around lines 179 - 182, The cleanup
currently deletes context.auth_headers but misses clearing scenario-specific
context.expect_tool_errors (set by configure_mock_server_errors) which affects
timeout logic in send_query_with_mcp_tools; update the teardown to also check
for and delete context.expect_tool_errors (similar to auth_headers) so that
per-scenario state does not leak between scenarios.
| Scenario: MCP mock server receives client-provided token via MCP-HEADERS | ||
| Given The system is in default state | ||
| And I set the Authorization header to Bearer my-k8s-token | ||
| And I set the MCP-HEADERS header with client token for "mock-client-auth" | ||
| And The MCP mock server request log is cleared | ||
| When I send a query that uses MCP tools | ||
| And I wait for MCP server to receive requests | ||
| Then The MCP mock server should have received requests | ||
| And The MCP mock server request log should contain exactly tools mock_tool_file, mock_tool_k8s, mock_tool_client |
There was a problem hiding this comment.
Scenario title promises token verification but only checks tool presence.
The scenario is named "MCP mock server receives client-provided token via MCP-HEADERS", implying the client token value is verified. However, the Then step only asserts which tools were called — it doesn't check the actual Authorization header value like the file-auth (Line 37) and k8s-auth (Line 46) scenarios do.
Consider adding a step like:
And The MCP mock server should have captured Authorization header containing "my-client-token" from client-auth server
🤖 Prompt for AI Agents
In `@tests/e2e/features/mcp_tools.feature` around lines 48 - 56, The scenario "MCP
mock server receives client-provided token via MCP-HEADERS" promises
verification of the client token but only asserts tools called; add an assertion
step that checks the recorded request's Authorization header equals the expected
token (e.g. "my-k8s-token"/"my-client-token") after the existing tool
assertions. Locate the scenario and add a step similar to the other scenarios'
header checks (see the file-auth and k8s-auth scenarios) so the MCP mock server
request log is queried for the Authorization header from the client-auth request
and asserted to contain the expected token value.
| @then("The MCP mock server should have received multiple tools/call methods") | ||
| def check_multiple_tool_calls(context: Context) -> None: | ||
| """Verify MCP server received multiple tool call requests. | ||
|
|
||
| Parameters: | ||
| context (Context): Behave context. | ||
| """ | ||
| mock_server_url = MOCK_MCP_SERVER_URL | ||
|
|
||
| try: | ||
| # Mock server debug endpoint can be slow with many requests - use 15s timeout | ||
| response = requests.get(f"{mock_server_url}/debug/requests", timeout=15) | ||
| assert response.status_code == 200, "Failed to get debug requests" | ||
|
|
||
| requests_log = response.json() | ||
| # We expect multiple requests (at least for discovery + calls) | ||
| assert ( | ||
| len(requests_log) >= 3 | ||
| ), f"Expected multiple requests, got {len(requests_log)}" | ||
| print(f"✅ MCP server received {len(requests_log)} requests (multiple tools)") | ||
| except requests.RequestException as e: | ||
| raise AssertionError(f"Could not connect to MCP mock server: {e}") from e |
There was a problem hiding this comment.
check_multiple_tool_calls doesn't filter for tools/call method.
Same issue as check_tools_call_method — it asserts len(requests_log) >= 3 on total request count rather than filtering for tools/call entries specifically. Initialization and discovery requests inflate the count, so the assertion can pass without any actual tool calls.
Proposed fix
requests_log = response.json()
- # We expect multiple requests (at least for discovery + calls)
- assert (
- len(requests_log) >= 3
- ), f"Expected multiple requests, got {len(requests_log)}"
- print(f"✅ MCP server received {len(requests_log)} requests (multiple tools)")
+ tools_call_entries = [
+ req for req in requests_log if req.get("rpc_method") == "tools/call"
+ ]
+ assert (
+ len(tools_call_entries) >= 2
+ ), f"Expected multiple tools/call requests, got {len(tools_call_entries)}"
+ print(f"✅ MCP server received {len(tools_call_entries)} tools/call request(s)")🤖 Prompt for AI Agents
In `@tests/e2e/features/steps/mcp.py` around lines 697 - 718, The test
check_multiple_tool_calls currently asserts on total requests_log length which
can be inflated; change it to filter the debug requests for only tool-call
entries (e.g., where each entry's method or path indicates "tools/call") and
assert the filtered list has length >= 3; update variable names (e.g.,
tool_calls = [r for r in requests_log if r.get("method")== "tools/call" or
"tools/call" in r.get("path","")]) and use len(tool_calls) in the assertion and
the printed message so the test verifies actual tool call requests.
65219bf to
ba8bd3c
Compare
tests/e2e/features/environment.py
Outdated
| restart_container("lightspeed-stack") | ||
|
|
||
| # Clear auth headers to prevent leakage between scenarios | ||
| # Context attributes persist across scenarios unless explicitly cleared |
There was a problem hiding this comment.
I do not see a point for this logic, context is always cleared at the start of new scenario, so the comment is wrong and the logic is redundant
b82eb01 to
387b1c5
Compare
tests/e2e/features/mcp_tools.feature
Outdated
| And The body of the response has proper client auth options structure | ||
| And The response contains server "mock-client-auth" with client auth header "Authorization" | ||
|
|
||
| Scenario: Service reports MCP configuration correctly |
There was a problem hiding this comment.
I am completely missing the point of this scenario. If the title is correct, then I would expect here some verification step to confirm the existance of the reported info, status of 200 does not tell me this information
| Scenario: MCP mock server receives file-based static token | ||
| Given The system is in default state | ||
| And I set the Authorization header to Bearer my-k8s-token | ||
| And The MCP mock server request log is cleared |
There was a problem hiding this comment.
this step is just implementation detail, please include it in some other step definition
There was a problem hiding this comment.
It's not redundant - it's intentional and makes the test more maintainable.
| Then The MCP mock server should have received requests | ||
| And The MCP mock server should have captured Authorization header "Bearer test-secret-token-123" from file-auth server | ||
|
|
||
| Scenario: MCP mock server receives kubernetes token from request |
There was a problem hiding this comment.
this scenario is identical in content as the previous one and yet it expects different result, or do we have three scenarios that verify same workflow but different outcome?
There was a problem hiding this comment.
The scenarios ARE different:
Different configuration - Each MCP server is configured differently (file path vs "kubernetes" vs "client")
Different expected token values - Line 31 vs Line 40 check different tokens
Scenario 3 has extra setup (line 45: I set the MCP-HEADERS header)
Scenario 3 has different verification (line 50: checks which tools were called, not just the token)
| Given The system is in default state | ||
| And I set the Authorization header to Bearer my-k8s-token | ||
| And The MCP mock server request log is cleared | ||
| When I send a query that uses MCP tools |
There was a problem hiding this comment.
there is already implemented a test step to use a query endpoint, what reason is here to define new one?
There was a problem hiding this comment.
Each query step uses intentionally different wording to trigger specific LLM behaviors (tool discovery vs. single-tool execution vs. multiple-tool execution) and includes scenario-specific configuration (different timeouts, error handling flags), making them distinct test fixtures rather than duplicates - consolidating them would require complex parameterization that would reduce test clarity without meaningful benefit.
|
|
||
| @then("The MCP mock server should have received at least {count:d} requests") | ||
| def check_mcp_server_request_count(context: Context, count: int) -> None: | ||
| """Verify the MCP mock server received at least N requests. |
There was a problem hiding this comment.
why are here two steps for the same behavior?, this step and the previous one can be merged as you can you this one with 1 to achieve the same behavior as the previous step
| Then The MCP mock server should have received requests | ||
| And The MCP mock server should have captured Authorization header containing "my-k8s-token" from k8s-auth server | ||
|
|
||
| Scenario: MCP mock server receives client-provided token via MCP-HEADERS |
There was a problem hiding this comment.
this scenario and All three MCP auth types work in a single request are identical, what is the specific reason to have both?
There was a problem hiding this comment.
We are checking which mcp servers are invoked based on the headers provided
387b1c5 to
e396b36
Compare
| @@ -0,0 +1,190 @@ | |||
| @MCP | |||
There was a problem hiding this comment.
if the test steps are identical, then please merge these to one scenario
There was a problem hiding this comment.
sorry, what does this mean?
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@tests/e2e/features/steps/mcp.py`:
- Around line 568-605: The keyword check in check_response_has_tool_execution is
too broad and can give false positives; update it to look for specific
identifiers and structured fields: inspect response_data for keys like
"tool_calls" and "tool_results" and treat the presence of non-empty lists/dicts
there as success, and additionally search the lowercased response text for
concrete mock tool names such as "mock_tool_file" and "mock_tool_k8s" (instead
of generic words "tool", "mock", "executed", "success"); ensure the assertion
prefers structured-field checks first and falls back to these specific name
checks only if those fields are absent.
- Around line 715-738: Both step functions are too weak: update
check_all_tool_calls_succeeded to assert the mock server actually received
tools/call requests (e.g., inspect context.mock_server.request_log or
context.mock_server.requests and assert any entry.path contains "tools/call" and
count matches expected), and strengthen check_response_has_all_results to
validate response_data["response"] contains tool-specific results (e.g., verify
expected tool IDs/names from context.expected_tools or context.tool_calls are
present and each result has a success status or non-empty output). Use the
existing symbols check_all_tool_calls_succeeded, check_response_has_all_results,
context.response, and response_data to locate and implement these assertions.
🧹 Nitpick comments (11)
docker-compose-library.yaml (1)
69-73: Hardcoded Python interpreter path is brittle.The entrypoint hardcodes
/app-root/.venv/bin/python3.12. If the container's Python version is upgraded (e.g., to 3.13), this will silently fail at startup. Consider using/app-root/.venv/bin/python3or/app-root/.venv/bin/pythoninstead for forward compatibility.♻️ Suggested change
entrypoint: > /bin/bash -c " printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token && - /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py + /app-root/.venv/bin/python3 /app-root/src/lightspeed_stack.py "dev-tools/mcp-mock-server/server.py (3)
28-30: Thread-safety note on shared global state.
last_headersandrequest_logare mutable globals accessed concurrently by HTTP and HTTPS server threads without synchronization. For a single-threaded-per-server test mock this is low risk, but concurrent requests (e.g., from parallel test scenarios) could produce corrupted state inrequest_log.Acceptable for a dev/test tool, but worth a comment for future maintainers.
97-113:match Truewith guard clauses is non-idiomatic.The
match True: case _ if "..." in auth_header:pattern works but is an unusual substitute forif/elif/else. A simpleif/elif/elsechain would be clearer and more familiar to most Python developers.♻️ Suggested simplification
- match True: - case _ if "test-secret-token" in auth_header: - tool_name = "mock_tool_file" - tool_desc = "Mock tool with file-based auth" - case _ if "my-k8s-token" in auth_header: - tool_name = "mock_tool_k8s" - tool_desc = "Mock tool with Kubernetes token" - case _ if "my-client-token" in auth_header: - tool_name = "mock_tool_client" - tool_desc = "Mock tool with client-provided token" - case _ if "error-mode" in auth_header: - tool_name = "mock_tool_error" - tool_desc = "Mock tool configured to return errors" - error_mode = True - case _: - # Default case already set above - pass + if "test-secret-token" in auth_header: + tool_name = "mock_tool_file" + tool_desc = "Mock tool with file-based auth" + elif "my-k8s-token" in auth_header: + tool_name = "mock_tool_k8s" + tool_desc = "Mock tool with Kubernetes token" + elif "my-client-token" in auth_header: + tool_name = "mock_tool_client" + tool_desc = "Mock tool with client-provided token" + elif "error-mode" in auth_header: + tool_name = "mock_tool_error" + tool_desc = "Mock tool configured to return errors" + error_mode = True
235-239:/debug/clearuses GET for a state-mutating operation.Using GET for clearing state violates REST conventions (safe/idempotent). For a test tool this is pragmatic, but if other tests or monitoring accidentally hit this endpoint, it could silently wipe debug data. Consider using DELETE or POST instead.
src/utils/responses.py (2)
378-396: Nested_get_token_valuecaptures loop variablemcp_server— safe here but fragile.The nested function
_get_token_valuereadsmcp_serverfrom the enclosingforloop scope (line 390). This is safe because the function is called synchronously within the same iteration, but it's easy for a future refactor (e.g., making it a callback or storing it) to introduce a late-binding closure bug. Consider passingmcp_serveras a parameter instead.♻️ Suggested change
- def _get_token_value(original: str, header: str) -> str | None: - """Convert to header value.""" + def _get_token_value(original: str, header: str, server: ModelContextProtocolServer) -> str | None: + """Convert to header value.""" match original: case constants.MCP_AUTH_KUBERNETES: # use k8s token if token is None or token == "": return None return f"Bearer {token}" case constants.MCP_AUTH_CLIENT: # use client provided token if mcp_headers is None: return None - client_headers = mcp_headers.get(mcp_server.name, None) + client_headers = mcp_headers.get(server.name, None) if client_headers is None: return None return client_headers.get(header, None) case _: # use provided return originalThen update the call site at line 421:
- resolved_value = _get_token_value(value, name) + resolved_value = _get_token_value(value, name, mcp_server)
453-485: Complex skip-server logic is hard to follow — consider extracting a helper.The resolved-count comparison, case-insensitive header name recovery, and missing-header computation span 30+ lines with nested conditionals. Extracting this into a small
_check_required_headers_resolved(...)helper would improve readability and testability.src/app/main.py (1)
23-24: Remove the commented-out import instead of leaving it as dead code.The commented-out
from utils.common import register_mcp_servers_asyncline and the associated commented-out call (lines 96-98) should be removed entirely. The explanatory comment# Not needed for Responses APIcould be a standalone note if needed, but dead import lines tend to accumulate.♻️ Suggested cleanup
- -# from utils.common import register_mcp_servers_async # Not needed for Responses API from utils.llama_stack_version import check_llama_stack_versionAnd at lines 94-98:
- # NOTE: MCP server registration not needed for Responses API - # The Responses API takes inline tool definitions instead of pre-registered toolgroups - # logger.info("Registering MCP servers") - # await register_mcp_servers_async(logger, configuration.configuration) - # get_logger("app.endpoints.handlers") + # NOTE: MCP server registration not needed for Responses API — + # the Responses API takes inline tool definitions instead of pre-registered toolgroups. logger.info("App startup complete")tests/e2e/features/steps/mcp.py (2)
142-242: Extract shared helper for auth header verification steps.
check_file_auth_header,check_k8s_auth_header, andcheck_client_auth_headershare nearly identical logic — fetch/debug/requests, filter by Authorization header match, assert. The only differences are exact-match vs. substring-match and the log message. Consider a shared helper to reduce the ~100 lines of duplication:♻️ Example helper extraction
def _check_auth_header_in_log( expected: str, *, exact: bool = True, label: str = "auth" ) -> None: """Check that the mock server received requests with the expected auth header.""" try: response = requests.get(f"{MOCK_MCP_SERVER_URL}/debug/requests", timeout=15) assert response.status_code == 200, "Failed to get debug requests" requests_log = response.json() if exact: matches = [ r for r in requests_log if r.get("headers", {}).get("Authorization") == expected ] else: matches = [ r for r in requests_log if expected in r.get("headers", {}).get("Authorization", "") ] assert len(matches) > 0, f"No requests found with {label} token: {expected}" print(f"✅ Found {len(matches)} request(s) with {label} token") except requests.RequestException as e: raise AssertionError(f"Could not connect to MCP mock server: {e}") from e
91-100: Fixedtime.sleepwaits are inherently flaky in E2E tests.
wait_for_mcp_requests(line 99:sleep(2)) andwait_for_tool_calls(line 478:sleep(3)) use hard-coded delays. These may be too short in slow CI environments or too long in fast ones. A polling loop with a timeout (e.g., poll/debug/requestsuntil the expected count is reached or timeout expires) would be more robust.♻️ Example polling approach
def _wait_for_mock_requests(min_count: int = 1, timeout: float = 10.0, poll_interval: float = 0.5) -> None: """Poll mock server until at least min_count requests are logged or timeout.""" deadline = time.monotonic() + timeout while time.monotonic() < deadline: try: resp = requests.get(f"{MOCK_MCP_SERVER_URL}/debug/requests", timeout=2) if resp.status_code == 200 and len(resp.json()) >= min_count: return except requests.RequestException: pass time.sleep(poll_interval) # Don't fail here — the assertion step will catch ittests/e2e/features/mcp_tools.feature (1)
113-124: CI timing concern is documented but no timeout mechanism is specified.The comment notes this scenario can take up to 120s in CI. If the behave step timeout is lower than this, it could cause flaky failures. Consider adding a
@slowtag or ensuring the test framework timeout accommodates this.tests/e2e/configuration/server-mode/lightspeed-stack-mcp.yaml (1)
1-51: Conversation cache is omitted without explanation.While
conversation_cacheis present in other e2e configs (library-mode and server-mode variants), the MCP-specific configs omit it. However, the query endpoint gracefully handles this case: whenconversation_cache_configuration.typeisNone, the code logs a warning and skips caching rather than failing. The MCP tests will execute successfully without it.That said, for consistency and clarity, either add the
conversation_cachesection or include an explanatory comment (as the*-no-cache.yamlconfigs do for explicit error-handling scenarios).
| @then("The response should contain MCP tool execution results") | ||
| @then("The response should indicate successful tool execution") | ||
| @then("The response should contain tool call information") | ||
| @then("The tool execution results should be included in the response") | ||
| def check_response_has_tool_execution(context: Context) -> None: | ||
| """Verify response contains evidence of MCP tool execution. | ||
|
|
||
| This consolidated step checks that the response contains tool-related content, | ||
| which could be tool calls, tool results, or references to tool execution in | ||
| the response text. | ||
|
|
||
| Parameters: | ||
| context (Context): Behave context. | ||
| """ | ||
| assert context.response is not None, "No response received" | ||
| assert ( | ||
| context.response.status_code == 200 | ||
| ), f"Bad status: {context.response.status_code}" | ||
|
|
||
| response_data = context.response.json() | ||
| assert "response" in response_data, "Response missing 'response' field" | ||
|
|
||
| # Check for evidence of tool execution in the response | ||
| response_text = str(response_data.get("response", "")).lower() | ||
|
|
||
| # Look for tool-related indicators in the response | ||
| # (tool execution, mock tool, or specific tool results) | ||
| has_tool_content = any( | ||
| indicator in response_text | ||
| for indicator in ["tool", "mock", "executed", "success"] | ||
| ) | ||
|
|
||
| assert has_tool_content, ( | ||
| "Response does not contain evidence of tool execution. " | ||
| f"Response text: {response_data.get('response', '')[:200]}" | ||
| ) | ||
|
|
||
| print("✅ Response contains evidence of tool execution") |
There was a problem hiding this comment.
Consolidated tool execution step is an improvement, but the keyword check is fragile.
The consolidation of four step decorators into one function is good (addresses past review). However, the assertion on lines 595-598 checks for generic substrings ("tool", "mock", "executed", "success") in the lowercased response text. A response like "I was unable to successfully find any tools" would pass this check despite indicating failure.
Consider checking for more specific indicators such as the mock tool names (e.g., "mock_tool_file", "mock_tool_k8s") or looking into tool_calls/tool_results fields if available in the response payload.
🤖 Prompt for AI Agents
In `@tests/e2e/features/steps/mcp.py` around lines 568 - 605, The keyword check in
check_response_has_tool_execution is too broad and can give false positives;
update it to look for specific identifiers and structured fields: inspect
response_data for keys like "tool_calls" and "tool_results" and treat the
presence of non-empty lists/dicts there as success, and additionally search the
lowercased response text for concrete mock tool names such as "mock_tool_file"
and "mock_tool_k8s" (instead of generic words "tool", "mock", "executed",
"success"); ensure the assertion prefers structured-field checks first and falls
back to these specific name checks only if those fields are absent.
| @then("All tool calls should have succeeded") | ||
| def check_all_tool_calls_succeeded(context: Context) -> None: | ||
| """Verify all tool calls succeeded. | ||
|
|
||
| Parameters: | ||
| context (Context): Behave context. | ||
| """ | ||
| assert context.response is not None, "No response received" | ||
| assert context.response.status_code == 200, "Request failed" | ||
| print("✅ All tool calls completed successfully") | ||
|
|
||
|
|
||
| @then("The response should contain results from all tool calls") | ||
| def check_response_has_all_results(context: Context) -> None: | ||
| """Verify response contains results from all tool calls. | ||
|
|
||
| Parameters: | ||
| context (Context): Behave context. | ||
| """ | ||
| assert context.response is not None, "No response received" | ||
| response_data = context.response.json() | ||
|
|
||
| assert "response" in response_data, "Response missing 'response' field" | ||
| print("✅ Response contains results from all tool calls") |
There was a problem hiding this comment.
check_all_tool_calls_succeeded and check_response_has_all_results provide no meaningful assertions.
check_all_tool_calls_succeeded (line 716-724) only checks status_code == 200, and check_response_has_all_results (line 728-738) only checks "response" in response_data. Any successful non-tool query would pass both checks. These steps give false confidence that tool calls actually occurred and succeeded.
At minimum, cross-reference with the mock server's request log to verify tools/call entries were actually made, or check for tool-specific content in the response body.
🤖 Prompt for AI Agents
In `@tests/e2e/features/steps/mcp.py` around lines 715 - 738, Both step functions
are too weak: update check_all_tool_calls_succeeded to assert the mock server
actually received tools/call requests (e.g., inspect
context.mock_server.request_log or context.mock_server.requests and assert any
entry.path contains "tools/call" and count matches expected), and strengthen
check_response_has_all_results to validate response_data["response"] contains
tool-specific results (e.g., verify expected tool IDs/names from
context.expected_tools or context.tool_calls are present and each result has a
success status or non-empty output). Use the existing symbols
check_all_tool_calls_succeeded, check_response_has_all_results,
context.response, and response_data to locate and implement these assertions.
Description
This PR enhances MCP support with three key improvements:
E2E tests implemented here will still fail, as they would probably require additional LCore changes
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
https://issues.redhat.com/browse/LCORE-1243
https://issues.redhat.com/browse/LCORE-1243
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Documentation
Tests
Chores