Skip to content

Adjusting MCP header processing to llama stack 0.4.x and adding MCP e2e tests#1141

Open
blublinsky wants to merge 1 commit intolightspeed-core:mainfrom
blublinsky:mcp-e2e-simple
Open

Adjusting MCP header processing to llama stack 0.4.x and adding MCP e2e tests#1141
blublinsky wants to merge 1 commit intolightspeed-core:mainfrom
blublinsky:mcp-e2e-simple

Conversation

@blublinsky
Copy link
Contributor

@blublinsky blublinsky commented Feb 12, 2026

Description

This PR enhances MCP support with three key improvements:

  1. Header Processing Alignment: Separates the Authorization header into a dedicated parameter for llama_stack 0.4.x+ compatibility, ensuring proper MCP server authentication
  2. Comprehensive Logging: Adds detailed logging throughout the MCP lifecycle (startup configuration, tool preparation, header resolution, and execution) for simplified debugging of MCP interactions
  3. E2E Testing: Introduces 19 new end-to-end test scenarios covering all MCP authentication methods (kubernetes, client-provided, and file-based), with all tests passing (255/255 unit + integration tests)

E2E tests implemented here will still fail, as they would probably require additional LCore changes

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude
  • Generated by: N/A

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • JSON-RPC-style MCP endpoints (initialize, tools/list, tools/call) with mock tool execution and a /debug/clear endpoint; MCP mock server exposed on port 9000.
    • Conversation cache configuration option added.
  • Documentation

    • README updated with explicit MCP endpoint semantics, startup examples, and troubleshooting notes.
  • Tests

    • Extensive end-to-end MCP test suite covering multiple auth methods, discovery, tool execution, streaming, and error scenarios.
  • Chores

    • Expanded MCP-related logging and debug utilities for testing.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 12, 2026

Walkthrough

Adds JSON-RPC MCP endpoints (initialize, tools/list, tools/call) to the mock server, changes MCP header/authorization handling and logging across the app, replaces runtime MCP registration with config-driven logging, expands MCP test configs and E2E tests, and updates docker-compose ports/entrypoints and unit tests.

Changes

Cohort / File(s) Summary
MCP Mock Server
dev-tools/mcp-mock-server/README.md, dev-tools/mcp-mock-server/server.py
Convert mock server to JSON-RPC methods (initialize, tools/list, tools/call); add error-mode via auth header, default tool metadata, unified response construction and logging, expose /debug/clear, and improve request logging (rpc_method, tool_name).
Docker Compose
docker-compose.yaml, docker-compose-library.yaml
Change host port mapping for mcp-mock-server from 3000:3000 to 9000:3000; add lightspeed-stack entrypoint that writes a test token to /tmp/lightspeed-mcp-test-token and runs the stack script.
Core app & constants
src/constants.py, src/app/main.py, src/app/endpoints/streaming_query.py
Add MCP_AUTHORIZATION_HEADER constant; remove runtime MCP server registration in startup (log configured MCP servers instead); add MCP-related debug logging in streaming query endpoint.
Tool preparation & header handling
src/utils/responses.py
Extend prepare_tools, get_mcp_tools, and build_tool_call_summary signatures and logging; resolve per-server authorization (special-case Authorization header handling), attach authorization field to MCP tool defs, skip servers missing required auth, and add verbose MCP call/list/approval logging.
E2E test configs
tests/e2e/configuration/.../lightspeed-stack*.yaml
Add MCP-focused configs for library/server modes, add conversation_cache sqlite config, and add multiple test MCP server entries (mock-file-auth, mock-k8s-auth, mock-client-auth, plus legacy entries) with varied auth header modes.
E2E hooks & test list
tests/e2e/features/environment.py, tests/e2e/test_list.txt
Add MCP lifecycle hooks to backup/switch/restore mode-specific configs and restart stack around MCP-tagged features; add features/mcp_tools.feature to test discovery.
E2E feature & step defs
tests/e2e/features/mcp_tools.feature, tests/e2e/features/steps/mcp.py
Add comprehensive MCP Gherkin scenarios and Behave steps covering MCP-HEADERS management, mock-server debug endpoints, queries (standard and streaming), tools/list and tools/call validation, error-mode testing, and extensive assertions on logs, headers, and tool results.
Unit test updates
tests/unit/app/endpoints/test_streaming_query.py, tests/unit/utils/test_responses.py
Update tests to set mock_responses_params.tools = None, import/use MCP_AUTHORIZATION_HEADER in assertions, adjust expected header placement, and add server_label where required.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant LightspeedStack
    participant MCPMockServer
    participant LlamaStack

    Client->>LightspeedStack: Send query (may include MCP-HEADERS)
    LightspeedStack->>LightspeedStack: prepare_tools(): resolve per-server headers, build MCP tool defs (attach authorization)
    LightspeedStack->>MCPMockServer: POST JSON-RPC "tools/list"
    MCPMockServer-->>LightspeedStack: return tools array
    LightspeedStack->>LlamaStack: Call Responses API with MCP tools
    LlamaStack->>MCPMockServer: POST JSON-RPC "tools/call" (when tool invoked)
    MCPMockServer-->>LlamaStack: tool result or error
    LlamaStack-->>LightspeedStack: response containing tool_results
    LightspeedStack-->>Client: final aggregated response (streaming or non-streaming)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

ok-to-test

Suggested reviewers

  • tisnik
  • asimurka
  • eranco74
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: adjusting MCP header processing for llama_stack 0.4.x compatibility and adding comprehensive MCP end-to-end tests.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@tests/e2e/features/steps/mcp.py`:
- Around line 860-874: The step implementation for
check_streaming_response_has_tool_results only asserts status_code == 200 and
duplicates check_streaming_response_successful; update the step to either
implement real SSE parsing of context.response to assert presence of tool
execution events (parse Server-Sent Events, inspect event types/payloads for
tool results) or at minimum add a clear TODO/FIXME comment and change the
assertion to fail if SSE is expected but not parsed (e.g., mark test as xfail or
raise NotImplementedError), referencing
check_streaming_response_has_tool_results and
check_streaming_response_successful so reviewers can find and reconcile the
duplicate behavior.
- Around line 775-806: The streaming request in
send_streaming_query_with_mcp_tools sets stream=True but never consumes or
closes the response, leaking connections; fix by ensuring context.response is
closed when you don't need the stream (either call response.close() before
returning from send_streaming_query_with_mcp_tools or change to stream=False if
streaming isn't required), or add a global cleanup (e.g., after_scenario hook
that grabs getattr(context, "response", None) and calls close()) to guarantee
closure; apply the same fix to the other streaming helper referenced around
lines 809-843 so no requests.Response objects remain unclosed.
- Around line 581-641: Four step functions (check_response_has_tool_results,
check_successful_tool_execution, check_response_has_tool_info,
check_tool_results_in_response) all perform the same minimal assertions, which
masks real tool-verification; either consolidate them into one reusable step or
strengthen each with tool-specific checks (e.g., assert presence of
"tool_calls", "tool_results", or expected keys inside response["response"], or
validate a tool call entry structure/content). Update the functions to call a
shared helper (e.g., assert_tool_response_contains(response_data,
required_keys)) or replace duplicate bodies with distinct assertions that check
tool-specific fields or content substrings so the steps actually validate tool
execution and call metadata.
- Around line 530-557: The test step function check_tools_list_calls currently
issues its own tools/list JSON-RPC request (payload) to prove the mock is alive;
change it to inspect the MCP mock server's recorded request/log endpoint (use
mock_server_url + the mock's requests or debug API) and assert that at least one
recorded request has method == "tools/list"; specifically replace the
requests.post(payload) call with a request to the mock server's
recorded-requests API, parse the returned request entries for entry["method"] or
entry["jsonrpc"]["method"] == "tools/list", and raise AssertionError if none
found while keeping the same timeout and RequestException handling.
🧹 Nitpick comments (16)
docker-compose-library.yaml (1)

69-73: Hardcoded test secret in entrypoint — acceptable for test-only compose files, but verify it won't leak into production images.

The token test-secret-token-123 is written at container startup. This is fine for a test-only docker-compose file, but ensure this compose file is never used in production or CI pipelines that deploy to shared environments.

One minor note: if the Python process crashes on startup, the exit code is swallowed by bash -c. Consider adding set -e or exec before the Python command so the container properly reports failure.

Suggested improvement
     entrypoint: >
       /bin/bash -c "
       printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token &&
-      /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
+      exec /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
       "
dev-tools/mcp-mock-server/server.py (4)

63-67: Excessive pylint suppressions suggest the method needs decomposition.

do_POST has four pylint disables (invalid-name, too-many-locals, too-many-branches, too-many-statements). For a test mock this is acceptable, but consider extracting the method-dispatch logic into helper methods (e.g., _handle_initialize, _handle_tools_list, _handle_tools_call) to reduce complexity.


84-86: Fragile mutation of request_log[-1] — relies on ordering with _capture_headers.

This appends rpc_method to the last entry in request_log, which was just added by _capture_headers() on line 69. If _capture_headers ever changes to not append (e.g., due to an exception or refactor), this would raise an IndexError or annotate the wrong entry. For a test mock, this is low risk.


97-113: match True pattern is unconventional — consider using if/elif instead.

The match True: case _ if ... pattern works but reads oddly compared to a straightforward if/elif chain, which would be more idiomatic for this type of conditional dispatch.


235-239: /debug/clear endpoint uses GET for a mutating operation.

Using GET for state mutation violates HTTP semantics. For a test-only mock server this is fine, but a POST or DELETE would be more conventional. No action needed unless this mock is reused more broadly.

tests/e2e/configuration/server-mode/lightspeed-stack.yaml (1)

21-25: New conversation_cache block added — appears unrelated to MCP changes.

This sqlite conversation cache addition seems orthogonal to the MCP header processing changes in this PR. If it's needed for the MCP tests specifically, a comment explaining why would be helpful. Otherwise, consider splitting it into a separate commit/PR for cleaner change tracking.

src/utils/responses.py (2)

139-139: Increasing pylint complexity suppressions — the function is growing.

prepare_tools now suppresses too-many-locals and too-many-branches. Most of the added lines are debug logging. This is acceptable for now, but if more logic is added in the future, consider extracting the logging/summary block (lines 189-235) into a helper.


453-485: Header resolution skip logic is correct but the logging reconstruction is complex.

The resolved/missing header computation for the warning message (lines 458-471) handles case-insensitive Authorization matching well, but the logic is intricate for what's essentially a log message. Consider simplifying by tracking resolved header names as you go (in the resolution loop above) rather than reconstructing them after the fact.

docker-compose.yaml (1)

87-91: Hardcoded Python version in entrypoint path.

The path /app-root/.venv/bin/python3.12 is brittle — if the base image upgrades Python, this will break silently. Consider using /app-root/.venv/bin/python or /app-root/.venv/bin/python3 instead.

Suggested change
     entrypoint: >
       /bin/bash -c "
       printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token &&
-      /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
+      /app-root/.venv/bin/python3 /app-root/src/lightspeed_stack.py
       "
src/app/main.py (2)

23-24: Remove commented-out import.

This commented-out import is dead code. If the Responses API approach is confirmed, remove it entirely rather than leaving breadcrumbs.

Suggested change
-
-# from utils.common import register_mcp_servers_async  # Not needed for Responses API

94-98: Remove commented-out registration code.

Lines 94-98 contain commented-out code that duplicates the intent already expressed by the NOTE comment on line 94-95. The old function calls on lines 96-98 should be removed.

Suggested change
     # NOTE: MCP server registration not needed for Responses API
     # The Responses API takes inline tool definitions instead of pre-registered toolgroups
-    # logger.info("Registering MCP servers")
-    # await register_mcp_servers_async(logger, configuration.configuration)
-    # get_logger("app.endpoints.handlers")
tests/e2e/configuration/library-mode/lightspeed-stack.yaml (1)

20-24: Conversation cache addition appears unrelated to MCP changes.

This conversation_cache block seems orthogonal to the MCP work. If intentional, consider noting it in the PR description.

tests/e2e/features/mcp_tools.feature (1)

58-64: Consider verifying the negative test more explicitly.

The "skipped when MCP-HEADERS is missing" scenario (line 58) validates via tool log contents (exactly mock_tool_file, mock_tool_k8s), which implicitly asserts that mock_tool_client is absent. This is acceptable, but adding an explicit "should NOT contain tool mock_tool_client" step would make the intent clearer to future readers.

tests/e2e/features/steps/mcp.py (3)

51-84: Extract a shared helper to reduce query-sending duplication.

Five functions (send_query_with_mcp_tools, send_query_about_tools, send_query_requesting_tool_usage, send_query_triggering_tool_usage, send_query_requiring_multiple_tools) follow the same pattern: initialize auth_headers, build URL, POST payload, handle exception. They differ only in query text and timeout.

Also note that send_query_with_mcp_tools (Line 64) omits model/provider from the payload while the other four include them — verify this is intentional.

♻️ Suggested helper
def _send_query(
    context: Context,
    query: str,
    *,
    timeout: int = 30,
    include_model: bool = True,
    stream: bool = False,
) -> None:
    """Common helper for sending query requests."""
    if not hasattr(context, "auth_headers"):
        context.auth_headers = {}

    base_url = f"http://{context.hostname}:{context.port}"
    endpoint = "streaming_query" if stream else "query"
    url = f"{base_url}/v1/{endpoint}"

    payload: dict = {"query": query}
    if include_model:
        payload["model"] = getattr(context, "default_model", "gpt-4o-mini")
        payload["provider"] = getattr(context, "default_provider", "openai")

    try:
        context.response = requests.post(
            url, json=payload, headers=context.auth_headers,
            timeout=timeout, stream=stream,
        )
    except requests.RequestException as e:
        print(f"❌ Query request failed: {e}")
        context.response = None

Each step then becomes a one-liner, e.g.:

`@when`("I send a query that uses MCP tools")
def send_query_with_mcp_tools(context: Context) -> None:
    _send_query(context, "What tools are available?", include_model=False)

Also applies to: 382-479, 493-527


98-150: Extract a helper for fetching the mock server request log.

The pattern of GET /debug/requests → assert 200 → parse JSON → handle RequestException is duplicated across ~10 functions (check_mcp_server_received_requests, check_mcp_server_request_count, check_file_auth_header, check_k8s_auth_header, check_client_auth_header, check_mcp_tool_in_log, check_mcp_tool_not_in_log, check_mcp_exact_tools_in_log, check_tools_call_method, check_multiple_tool_calls).

♻️ Suggested helper
def _get_mock_request_log() -> list:
    """Fetch the request log from the MCP mock server's debug endpoint."""
    try:
        response = requests.get(f"{MOCK_MCP_SERVER_URL}/debug/requests", timeout=15)
        assert response.status_code == 200, (
            f"Failed to get debug requests: {response.status_code}"
        )
        log = response.json()
        assert isinstance(log, list), f"Expected list, got {type(log)}"
        return log
    except requests.RequestException as e:
        raise AssertionError(f"Could not connect to MCP mock server: {e}") from e

Each step then just calls _get_mock_request_log() and runs its specific assertion, e.g.:

`@then`("The MCP mock server should have received requests")
def check_mcp_server_received_requests(context: Context) -> None:
    log = _get_mock_request_log()
    assert len(log) > 0, "MCP mock server received no requests"

276-282: Prefer any() over manual loop-and-flag.

♻️ Suggested simplification
-        found = False
-        for req in requests_log:
-            if req.get("tool_name") == tool_name:
-                found = True
-                break
-
-        assert found, f"Tool '{tool_name}' not found in mock server request log"
+        assert any(
+            req.get("tool_name") == tool_name for req in requests_log
+        ), f"Tool '{tool_name}' not found in mock server request log"

mcp_server_endpoints = [
(t.get("server_label"), t.get("server_url"))
for t in responses_params.tools
if t.get("type") == "mcp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is repeated few lines above, the filtering can be performed once

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

)
# Retrieve response stream (may raise exceptions)
# Log request details before calling Llama Stack (MCP debugging)
if responses_params.tools:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

condition should be explicit, what exactly does it mean written like this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

for t in responses_params.tools
if t.get("type") == "mcp"
]
if mcp_server_endpoints:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again the condition should be explicit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Fix all issues with AI agents
In `@tests/e2e/configuration/library-mode/lightspeed-stack-mcp.yaml`:
- Around line 49-50: Add a POSIX newline at EOF: open the YAML file and ensure
the final line (the mapping under authorization_headers, e.g. X-API-Token:
"client") is followed by a single trailing newline character so the file ends
with a newline.

In `@tests/e2e/features/environment.py`:
- Around line 179-182: The cleanup currently deletes context.auth_headers but
misses clearing scenario-specific context.expect_tool_errors (set by
configure_mock_server_errors) which affects timeout logic in
send_query_with_mcp_tools; update the teardown to also check for and delete
context.expect_tool_errors (similar to auth_headers) so that per-scenario state
does not leak between scenarios.

In `@tests/e2e/features/mcp_tools.feature`:
- Around line 48-56: The scenario "MCP mock server receives client-provided
token via MCP-HEADERS" promises verification of the client token but only
asserts tools called; add an assertion step that checks the recorded request's
Authorization header equals the expected token (e.g.
"my-k8s-token"/"my-client-token") after the existing tool assertions. Locate the
scenario and add a step similar to the other scenarios' header checks (see the
file-auth and k8s-auth scenarios) so the MCP mock server request log is queried
for the Authorization header from the client-auth request and asserted to
contain the expected token value.

In `@tests/e2e/features/steps/mcp.py`:
- Around line 555-573: The test check_tools_call_method currently only asserts
that requests_log has entries but doesn't verify any are for the "tools/call"
RPC; update check_tools_call_method to filter response.json() for entries where
entry.get("rpc_method") == "tools/call" (same approach as
check_tools_list_calls), then assert that the filtered list length is > 0 and
raise a clear assertion if none found; keep existing timeout/exception handling
and the initial GET to f"{mock_server_url}/debug/requests" but replace the
generic len(requests_log) check with the filtered check for "tools/call".
- Around line 51-84: The step function send_query_with_mcp_tools builds a
payload missing the "model" and "provider" keys causing inconsistent behavior
versus other send_* steps; update the payload in send_query_with_mcp_tools to
include the same "model" and "provider" fields used by the other helpers (see
send_query_about_tools, send_query_requesting_tool_usage,
send_query_triggering_tool_usage, send_query_requiring_multiple_tools) so the
request uses explicit model/provider values rather than defaults, preserving the
same keys and structure those functions send.
- Around line 697-718: The test check_multiple_tool_calls currently asserts on
total requests_log length which can be inflated; change it to filter the debug
requests for only tool-call entries (e.g., where each entry's method or path
indicates "tools/call") and assert the filtered list has length >= 3; update
variable names (e.g., tool_calls = [r for r in requests_log if r.get("method")==
"tools/call" or "tools/call" in r.get("path","")]) and use len(tool_calls) in
the assertion and the printed message so the test verifies actual tool call
requests.
- Around line 747-778: The test sends a request to /v1/streaming_query without
specifying media_type so the endpoint emits SSE (text/plain) and later
response.json() will fail; in send_streaming_query_with_mcp_tools update the
JSON payload to include "media_type": "application/json" (or alternatively
adjust the test assertions to parse SSE events instead of calling
response.json()); apply the same change to the other streaming test payloads in
this file that call /v1/streaming_query so they either include media_type or
handle SSE parsing in their assertion logic.
🧹 Nitpick comments (6)
dev-tools/mcp-mock-server/server.py (1)

97-113: Consider using if/elif instead of match True: for clarity.

The match True: case _ if "..." in auth_header: pattern is functionally correct but unconventional. A standard if/elif chain is more idiomatic and readable for simple string-contains checks. This is a test tool though, so it's a minor style nit.

tests/e2e/features/mcp_tools.feature (1)

119-130: Good CI timing documentation on slow scenarios.

The comments about CI timing variability (Lines 119–120 and 142–143) are helpful for maintainers. Consider also tagging these scenarios with something like @slow so CI pipelines can optionally skip them in fast-feedback runs.

tests/e2e/features/steps/mcp.py (4)

10-11: Consider making the mock server URL configurable via environment variable.

MOCK_MCP_SERVER_URL is hardcoded to localhost:9000. Other test infrastructure in this repo (e.g., E2E_LSC_HOSTNAME, E2E_LSC_PORT) uses environment variables, which allows flexibility when running tests against different environments.

Suggested change
-MOCK_MCP_SERVER_URL = "http://localhost:9000"
+MOCK_MCP_SERVER_URL = os.getenv("MCP_MOCK_SERVER_URL", "http://localhost:9000")

(Would also need import os at the top.)


321-339: Step name says "exactly" but implementation allows a subset of listed tools.

The Gherkin step text The MCP mock server request log should contain exactly tools {tool_list} implies all listed tools must be present. The actual implementation (Lines 355–363) only requires at least one expected tool to have been called, and it does not assert that all expected tools were called. This is intentionally flexible per the docstring, but the word "exactly" in the step name is misleading.

Consider renaming to something like should contain only tools from {tool_list} to better reflect the "at-least-one from this set, nothing outside it" semantics.


721-744: Weak assertions: steps only verify HTTP 200 / key existence.

check_all_tool_calls_succeeded (Lines 721–730) only asserts status_code == 200, and check_response_has_all_results (Lines 733–744) only asserts the response key exists. Neither verifies anything tool-specific. These pass for any successful query, including one that uses no tools.

If stronger assertions aren't feasible yet (due to LLM response non-determinism), consider adding a # TODO noting the gap so it's tracked.


153-253: Consider extracting a shared helper for /debug/requests fetching.

check_file_auth_header, check_k8s_auth_header, and check_client_auth_header (and many other steps) all repeat the same pattern: GET /debug/requests with 15s timeout, assert 200, parse JSON, then filter. A small helper would reduce duplication:

def _get_mock_request_log() -> list:
    """Fetch the MCP mock server request log."""
    response = requests.get(f"{MOCK_MCP_SERVER_URL}/debug/requests", timeout=15)
    assert response.status_code == 200, f"Failed to get debug requests: {response.status_code}"
    return response.json()

This would simplify each step to a single call + domain-specific filtering.

Comment on lines 49 to 50
authorization_headers:
X-API-Token: "client" No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing trailing newline at end of file.

Most POSIX tools and linters expect files to end with a newline character.

Proposed fix
     authorization_headers:
       X-API-Token: "client"
+
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
authorization_headers:
X-API-Token: "client"
authorization_headers:
X-API-Token: "client"
🤖 Prompt for AI Agents
In `@tests/e2e/configuration/library-mode/lightspeed-stack-mcp.yaml` around lines
49 - 50, Add a POSIX newline at EOF: open the YAML file and ensure the final
line (the mapping under authorization_headers, e.g. X-API-Token: "client") is
followed by a single trailing newline character so the file ends with a newline.

Comment on lines 179 to 182
# Clear auth headers to prevent leakage between scenarios
# Context attributes persist across scenarios unless explicitly cleared
if hasattr(context, "auth_headers"):
del context.auth_headers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Good: auth_headers cleanup prevents cross-scenario leakage.

This correctly addresses the Behave Context persistence issue — custom attributes persist across scenarios unless explicitly cleared. Based on learnings, "In the Behave Python testing framework, the Context object is created once for the entire test run and reused across all features and scenarios. Custom attributes added to the context persist until explicitly cleared."

However, context.expect_tool_errors (set in mcp.py:configure_mock_server_errors, Line 640) is also scenario-specific state that isn't cleared here. It affects timeout selection in send_query_with_mcp_tools (Line 70–74). Consider clearing it alongside auth_headers.

Proposed fix
     # Clear auth headers to prevent leakage between scenarios
     # Context attributes persist across scenarios unless explicitly cleared
     if hasattr(context, "auth_headers"):
         del context.auth_headers
+    if hasattr(context, "expect_tool_errors"):
+        del context.expect_tool_errors
🤖 Prompt for AI Agents
In `@tests/e2e/features/environment.py` around lines 179 - 182, The cleanup
currently deletes context.auth_headers but misses clearing scenario-specific
context.expect_tool_errors (set by configure_mock_server_errors) which affects
timeout logic in send_query_with_mcp_tools; update the teardown to also check
for and delete context.expect_tool_errors (similar to auth_headers) so that
per-scenario state does not leak between scenarios.

Comment on lines +48 to +56
Scenario: MCP mock server receives client-provided token via MCP-HEADERS
Given The system is in default state
And I set the Authorization header to Bearer my-k8s-token
And I set the MCP-HEADERS header with client token for "mock-client-auth"
And The MCP mock server request log is cleared
When I send a query that uses MCP tools
And I wait for MCP server to receive requests
Then The MCP mock server should have received requests
And The MCP mock server request log should contain exactly tools mock_tool_file, mock_tool_k8s, mock_tool_client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Scenario title promises token verification but only checks tool presence.

The scenario is named "MCP mock server receives client-provided token via MCP-HEADERS", implying the client token value is verified. However, the Then step only asserts which tools were called — it doesn't check the actual Authorization header value like the file-auth (Line 37) and k8s-auth (Line 46) scenarios do.

Consider adding a step like:

And The MCP mock server should have captured Authorization header containing "my-client-token" from client-auth server
🤖 Prompt for AI Agents
In `@tests/e2e/features/mcp_tools.feature` around lines 48 - 56, The scenario "MCP
mock server receives client-provided token via MCP-HEADERS" promises
verification of the client token but only asserts tools called; add an assertion
step that checks the recorded request's Authorization header equals the expected
token (e.g. "my-k8s-token"/"my-client-token") after the existing tool
assertions. Locate the scenario and add a step similar to the other scenarios'
header checks (see the file-auth and k8s-auth scenarios) so the MCP mock server
request log is queried for the Authorization header from the client-auth request
and asserted to contain the expected token value.

Comment on lines 697 to 718
@then("The MCP mock server should have received multiple tools/call methods")
def check_multiple_tool_calls(context: Context) -> None:
"""Verify MCP server received multiple tool call requests.

Parameters:
context (Context): Behave context.
"""
mock_server_url = MOCK_MCP_SERVER_URL

try:
# Mock server debug endpoint can be slow with many requests - use 15s timeout
response = requests.get(f"{mock_server_url}/debug/requests", timeout=15)
assert response.status_code == 200, "Failed to get debug requests"

requests_log = response.json()
# We expect multiple requests (at least for discovery + calls)
assert (
len(requests_log) >= 3
), f"Expected multiple requests, got {len(requests_log)}"
print(f"✅ MCP server received {len(requests_log)} requests (multiple tools)")
except requests.RequestException as e:
raise AssertionError(f"Could not connect to MCP mock server: {e}") from e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

check_multiple_tool_calls doesn't filter for tools/call method.

Same issue as check_tools_call_method — it asserts len(requests_log) >= 3 on total request count rather than filtering for tools/call entries specifically. Initialization and discovery requests inflate the count, so the assertion can pass without any actual tool calls.

Proposed fix
         requests_log = response.json()
-        # We expect multiple requests (at least for discovery + calls)
-        assert (
-            len(requests_log) >= 3
-        ), f"Expected multiple requests, got {len(requests_log)}"
-        print(f"✅ MCP server received {len(requests_log)} requests (multiple tools)")
+        tools_call_entries = [
+            req for req in requests_log if req.get("rpc_method") == "tools/call"
+        ]
+        assert (
+            len(tools_call_entries) >= 2
+        ), f"Expected multiple tools/call requests, got {len(tools_call_entries)}"
+        print(f"✅ MCP server received {len(tools_call_entries)} tools/call request(s)")
🤖 Prompt for AI Agents
In `@tests/e2e/features/steps/mcp.py` around lines 697 - 718, The test
check_multiple_tool_calls currently asserts on total requests_log length which
can be inflated; change it to filter the debug requests for only tool-call
entries (e.g., where each entry's method or path indicates "tools/call") and
assert the filtered list has length >= 3; update variable names (e.g.,
tool_calls = [r for r in requests_log if r.get("method")== "tools/call" or
"tools/call" in r.get("path","")]) and use len(tool_calls) in the assertion and
the printed message so the test verifies actual tool call requests.

restart_container("lightspeed-stack")

# Clear auth headers to prevent leakage between scenarios
# Context attributes persist across scenarios unless explicitly cleared
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see a point for this logic, context is always cleared at the start of new scenario, so the comment is wrong and the logic is redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

And The body of the response has proper client auth options structure
And The response contains server "mock-client-auth" with client auth header "Authorization"

Scenario: Service reports MCP configuration correctly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am completely missing the point of this scenario. If the title is correct, then I would expect here some verification step to confirm the existance of the reported info, status of 200 does not tell me this information

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Scenario: MCP mock server receives file-based static token
Given The system is in default state
And I set the Authorization header to Bearer my-k8s-token
And The MCP mock server request log is cleared
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this step is just implementation detail, please include it in some other step definition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not redundant - it's intentional and makes the test more maintainable.

Then The MCP mock server should have received requests
And The MCP mock server should have captured Authorization header "Bearer test-secret-token-123" from file-auth server

Scenario: MCP mock server receives kubernetes token from request
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this scenario is identical in content as the previous one and yet it expects different result, or do we have three scenarios that verify same workflow but different outcome?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenarios ARE different:
Different configuration - Each MCP server is configured differently (file path vs "kubernetes" vs "client")
Different expected token values - Line 31 vs Line 40 check different tokens
Scenario 3 has extra setup (line 45: I set the MCP-HEADERS header)
Scenario 3 has different verification (line 50: checks which tools were called, not just the token)

Given The system is in default state
And I set the Authorization header to Bearer my-k8s-token
And The MCP mock server request log is cleared
When I send a query that uses MCP tools
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is already implemented a test step to use a query endpoint, what reason is here to define new one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each query step uses intentionally different wording to trigger specific LLM behaviors (tool discovery vs. single-tool execution vs. multiple-tool execution) and includes scenario-specific configuration (different timeouts, error handling flags), making them distinct test fixtures rather than duplicates - consolidating them would require complex parameterization that would reduce test clarity without meaningful benefit.


@then("The MCP mock server should have received at least {count:d} requests")
def check_mcp_server_request_count(context: Context, count: int) -> None:
"""Verify the MCP mock server received at least N requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are here two steps for the same behavior?, this step and the previous one can be merged as you can you this one with 1 to achieve the same behavior as the previous step

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consolidated

Then The MCP mock server should have received requests
And The MCP mock server should have captured Authorization header containing "my-k8s-token" from k8s-auth server

Scenario: MCP mock server receives client-provided token via MCP-HEADERS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this scenario and All three MCP auth types work in a single request are identical, what is the specific reason to have both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are checking which mcp servers are invoked based on the headers provided

@@ -0,0 +1,190 @@
@MCP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the test steps are identical, then please merge these to one scenario

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, what does this mean?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@tests/e2e/features/steps/mcp.py`:
- Around line 568-605: The keyword check in check_response_has_tool_execution is
too broad and can give false positives; update it to look for specific
identifiers and structured fields: inspect response_data for keys like
"tool_calls" and "tool_results" and treat the presence of non-empty lists/dicts
there as success, and additionally search the lowercased response text for
concrete mock tool names such as "mock_tool_file" and "mock_tool_k8s" (instead
of generic words "tool", "mock", "executed", "success"); ensure the assertion
prefers structured-field checks first and falls back to these specific name
checks only if those fields are absent.
- Around line 715-738: Both step functions are too weak: update
check_all_tool_calls_succeeded to assert the mock server actually received
tools/call requests (e.g., inspect context.mock_server.request_log or
context.mock_server.requests and assert any entry.path contains "tools/call" and
count matches expected), and strengthen check_response_has_all_results to
validate response_data["response"] contains tool-specific results (e.g., verify
expected tool IDs/names from context.expected_tools or context.tool_calls are
present and each result has a success status or non-empty output). Use the
existing symbols check_all_tool_calls_succeeded, check_response_has_all_results,
context.response, and response_data to locate and implement these assertions.
🧹 Nitpick comments (11)
docker-compose-library.yaml (1)

69-73: Hardcoded Python interpreter path is brittle.

The entrypoint hardcodes /app-root/.venv/bin/python3.12. If the container's Python version is upgraded (e.g., to 3.13), this will silently fail at startup. Consider using /app-root/.venv/bin/python3 or /app-root/.venv/bin/python instead for forward compatibility.

♻️ Suggested change
     entrypoint: >
       /bin/bash -c "
       printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token &&
-      /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
+      /app-root/.venv/bin/python3 /app-root/src/lightspeed_stack.py
       "
dev-tools/mcp-mock-server/server.py (3)

28-30: Thread-safety note on shared global state.

last_headers and request_log are mutable globals accessed concurrently by HTTP and HTTPS server threads without synchronization. For a single-threaded-per-server test mock this is low risk, but concurrent requests (e.g., from parallel test scenarios) could produce corrupted state in request_log.

Acceptable for a dev/test tool, but worth a comment for future maintainers.


97-113: match True with guard clauses is non-idiomatic.

The match True: case _ if "..." in auth_header: pattern works but is an unusual substitute for if/elif/else. A simple if/elif/else chain would be clearer and more familiar to most Python developers.

♻️ Suggested simplification
-        match True:
-            case _ if "test-secret-token" in auth_header:
-                tool_name = "mock_tool_file"
-                tool_desc = "Mock tool with file-based auth"
-            case _ if "my-k8s-token" in auth_header:
-                tool_name = "mock_tool_k8s"
-                tool_desc = "Mock tool with Kubernetes token"
-            case _ if "my-client-token" in auth_header:
-                tool_name = "mock_tool_client"
-                tool_desc = "Mock tool with client-provided token"
-            case _ if "error-mode" in auth_header:
-                tool_name = "mock_tool_error"
-                tool_desc = "Mock tool configured to return errors"
-                error_mode = True
-            case _:
-                # Default case already set above
-                pass
+        if "test-secret-token" in auth_header:
+            tool_name = "mock_tool_file"
+            tool_desc = "Mock tool with file-based auth"
+        elif "my-k8s-token" in auth_header:
+            tool_name = "mock_tool_k8s"
+            tool_desc = "Mock tool with Kubernetes token"
+        elif "my-client-token" in auth_header:
+            tool_name = "mock_tool_client"
+            tool_desc = "Mock tool with client-provided token"
+        elif "error-mode" in auth_header:
+            tool_name = "mock_tool_error"
+            tool_desc = "Mock tool configured to return errors"
+            error_mode = True

235-239: /debug/clear uses GET for a state-mutating operation.

Using GET for clearing state violates REST conventions (safe/idempotent). For a test tool this is pragmatic, but if other tests or monitoring accidentally hit this endpoint, it could silently wipe debug data. Consider using DELETE or POST instead.

src/utils/responses.py (2)

378-396: Nested _get_token_value captures loop variable mcp_server — safe here but fragile.

The nested function _get_token_value reads mcp_server from the enclosing for loop scope (line 390). This is safe because the function is called synchronously within the same iteration, but it's easy for a future refactor (e.g., making it a callback or storing it) to introduce a late-binding closure bug. Consider passing mcp_server as a parameter instead.

♻️ Suggested change
-    def _get_token_value(original: str, header: str) -> str | None:
-        """Convert to header value."""
+    def _get_token_value(original: str, header: str, server: ModelContextProtocolServer) -> str | None:
+        """Convert to header value."""
         match original:
             case constants.MCP_AUTH_KUBERNETES:
                 # use k8s token
                 if token is None or token == "":
                     return None
                 return f"Bearer {token}"
             case constants.MCP_AUTH_CLIENT:
                 # use client provided token
                 if mcp_headers is None:
                     return None
-                client_headers = mcp_headers.get(mcp_server.name, None)
+                client_headers = mcp_headers.get(server.name, None)
                 if client_headers is None:
                     return None
                 return client_headers.get(header, None)
             case _:
                 # use provided
                 return original

Then update the call site at line 421:

-            resolved_value = _get_token_value(value, name)
+            resolved_value = _get_token_value(value, name, mcp_server)

453-485: Complex skip-server logic is hard to follow — consider extracting a helper.

The resolved-count comparison, case-insensitive header name recovery, and missing-header computation span 30+ lines with nested conditionals. Extracting this into a small _check_required_headers_resolved(...) helper would improve readability and testability.

src/app/main.py (1)

23-24: Remove the commented-out import instead of leaving it as dead code.

The commented-out from utils.common import register_mcp_servers_async line and the associated commented-out call (lines 96-98) should be removed entirely. The explanatory comment # Not needed for Responses API could be a standalone note if needed, but dead import lines tend to accumulate.

♻️ Suggested cleanup
-
-# from utils.common import register_mcp_servers_async  # Not needed for Responses API
 from utils.llama_stack_version import check_llama_stack_version

And at lines 94-98:

-    # NOTE: MCP server registration not needed for Responses API
-    # The Responses API takes inline tool definitions instead of pre-registered toolgroups
-    # logger.info("Registering MCP servers")
-    # await register_mcp_servers_async(logger, configuration.configuration)
-    # get_logger("app.endpoints.handlers")
+    # NOTE: MCP server registration not needed for Responses API —
+    # the Responses API takes inline tool definitions instead of pre-registered toolgroups.
     logger.info("App startup complete")
tests/e2e/features/steps/mcp.py (2)

142-242: Extract shared helper for auth header verification steps.

check_file_auth_header, check_k8s_auth_header, and check_client_auth_header share nearly identical logic — fetch /debug/requests, filter by Authorization header match, assert. The only differences are exact-match vs. substring-match and the log message. Consider a shared helper to reduce the ~100 lines of duplication:

♻️ Example helper extraction
def _check_auth_header_in_log(
    expected: str, *, exact: bool = True, label: str = "auth"
) -> None:
    """Check that the mock server received requests with the expected auth header."""
    try:
        response = requests.get(f"{MOCK_MCP_SERVER_URL}/debug/requests", timeout=15)
        assert response.status_code == 200, "Failed to get debug requests"
        requests_log = response.json()

        if exact:
            matches = [
                r for r in requests_log
                if r.get("headers", {}).get("Authorization") == expected
            ]
        else:
            matches = [
                r for r in requests_log
                if expected in r.get("headers", {}).get("Authorization", "")
            ]

        assert len(matches) > 0, f"No requests found with {label} token: {expected}"
        print(f"✅ Found {len(matches)} request(s) with {label} token")
    except requests.RequestException as e:
        raise AssertionError(f"Could not connect to MCP mock server: {e}") from e

91-100: Fixed time.sleep waits are inherently flaky in E2E tests.

wait_for_mcp_requests (line 99: sleep(2)) and wait_for_tool_calls (line 478: sleep(3)) use hard-coded delays. These may be too short in slow CI environments or too long in fast ones. A polling loop with a timeout (e.g., poll /debug/requests until the expected count is reached or timeout expires) would be more robust.

♻️ Example polling approach
def _wait_for_mock_requests(min_count: int = 1, timeout: float = 10.0, poll_interval: float = 0.5) -> None:
    """Poll mock server until at least min_count requests are logged or timeout."""
    deadline = time.monotonic() + timeout
    while time.monotonic() < deadline:
        try:
            resp = requests.get(f"{MOCK_MCP_SERVER_URL}/debug/requests", timeout=2)
            if resp.status_code == 200 and len(resp.json()) >= min_count:
                return
        except requests.RequestException:
            pass
        time.sleep(poll_interval)
    # Don't fail here — the assertion step will catch it
tests/e2e/features/mcp_tools.feature (1)

113-124: CI timing concern is documented but no timeout mechanism is specified.

The comment notes this scenario can take up to 120s in CI. If the behave step timeout is lower than this, it could cause flaky failures. Consider adding a @slow tag or ensuring the test framework timeout accommodates this.

tests/e2e/configuration/server-mode/lightspeed-stack-mcp.yaml (1)

1-51: Conversation cache is omitted without explanation.

While conversation_cache is present in other e2e configs (library-mode and server-mode variants), the MCP-specific configs omit it. However, the query endpoint gracefully handles this case: when conversation_cache_configuration.type is None, the code logs a warning and skips caching rather than failing. The MCP tests will execute successfully without it.

That said, for consistency and clarity, either add the conversation_cache section or include an explanatory comment (as the *-no-cache.yaml configs do for explicit error-handling scenarios).

Comment on lines +568 to +605
@then("The response should contain MCP tool execution results")
@then("The response should indicate successful tool execution")
@then("The response should contain tool call information")
@then("The tool execution results should be included in the response")
def check_response_has_tool_execution(context: Context) -> None:
"""Verify response contains evidence of MCP tool execution.

This consolidated step checks that the response contains tool-related content,
which could be tool calls, tool results, or references to tool execution in
the response text.

Parameters:
context (Context): Behave context.
"""
assert context.response is not None, "No response received"
assert (
context.response.status_code == 200
), f"Bad status: {context.response.status_code}"

response_data = context.response.json()
assert "response" in response_data, "Response missing 'response' field"

# Check for evidence of tool execution in the response
response_text = str(response_data.get("response", "")).lower()

# Look for tool-related indicators in the response
# (tool execution, mock tool, or specific tool results)
has_tool_content = any(
indicator in response_text
for indicator in ["tool", "mock", "executed", "success"]
)

assert has_tool_content, (
"Response does not contain evidence of tool execution. "
f"Response text: {response_data.get('response', '')[:200]}"
)

print("✅ Response contains evidence of tool execution")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Consolidated tool execution step is an improvement, but the keyword check is fragile.

The consolidation of four step decorators into one function is good (addresses past review). However, the assertion on lines 595-598 checks for generic substrings ("tool", "mock", "executed", "success") in the lowercased response text. A response like "I was unable to successfully find any tools" would pass this check despite indicating failure.

Consider checking for more specific indicators such as the mock tool names (e.g., "mock_tool_file", "mock_tool_k8s") or looking into tool_calls/tool_results fields if available in the response payload.

🤖 Prompt for AI Agents
In `@tests/e2e/features/steps/mcp.py` around lines 568 - 605, The keyword check in
check_response_has_tool_execution is too broad and can give false positives;
update it to look for specific identifiers and structured fields: inspect
response_data for keys like "tool_calls" and "tool_results" and treat the
presence of non-empty lists/dicts there as success, and additionally search the
lowercased response text for concrete mock tool names such as "mock_tool_file"
and "mock_tool_k8s" (instead of generic words "tool", "mock", "executed",
"success"); ensure the assertion prefers structured-field checks first and falls
back to these specific name checks only if those fields are absent.

Comment on lines +715 to +738
@then("All tool calls should have succeeded")
def check_all_tool_calls_succeeded(context: Context) -> None:
"""Verify all tool calls succeeded.

Parameters:
context (Context): Behave context.
"""
assert context.response is not None, "No response received"
assert context.response.status_code == 200, "Request failed"
print("✅ All tool calls completed successfully")


@then("The response should contain results from all tool calls")
def check_response_has_all_results(context: Context) -> None:
"""Verify response contains results from all tool calls.

Parameters:
context (Context): Behave context.
"""
assert context.response is not None, "No response received"
response_data = context.response.json()

assert "response" in response_data, "Response missing 'response' field"
print("✅ Response contains results from all tool calls")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

check_all_tool_calls_succeeded and check_response_has_all_results provide no meaningful assertions.

check_all_tool_calls_succeeded (line 716-724) only checks status_code == 200, and check_response_has_all_results (line 728-738) only checks "response" in response_data. Any successful non-tool query would pass both checks. These steps give false confidence that tool calls actually occurred and succeeded.

At minimum, cross-reference with the mock server's request log to verify tools/call entries were actually made, or check for tool-specific content in the response body.

🤖 Prompt for AI Agents
In `@tests/e2e/features/steps/mcp.py` around lines 715 - 738, Both step functions
are too weak: update check_all_tool_calls_succeeded to assert the mock server
actually received tools/call requests (e.g., inspect
context.mock_server.request_log or context.mock_server.requests and assert any
entry.path contains "tools/call" and count matches expected), and strengthen
check_response_has_all_results to validate response_data["response"] contains
tool-specific results (e.g., verify expected tool IDs/names from
context.expected_tools or context.tool_calls are present and each result has a
success status or non-empty output). Use the existing symbols
check_all_tool_calls_succeeded, check_response_has_all_results,
context.response, and response_data to locate and implement these assertions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants