feat: Minimal agent for AuthBridge OTEL (Approach A, zero custom observability) by Ladas · Pull Request #122 · kagenti/agent-examples

Ladas · 2026-02-13T08:33:37Z

Summary

Weather agent with zero custom observability code for Approach A (kagenti/kagenti#667).

Based on the pre-PR-114 weather agent, with only these additions:

W3C Trace Context propagation (so agent spans inherit AuthBridge root span)
OpenAI auto-instrumentation (for GenAI token metrics)
Dockerfile fix (Docker Hub base image)

The AuthBridge ext_proc creates the root span with all MLflow/OpenInference/GenAI attributes. Agent's LangChain and OpenAI auto-instrumented spans become children via traceparent header injection.

What the agent does NOT have (by design)

No observability.py module
No custom middleware
No mlflow.* attribute setting
No openinference.span.kind setting
No root span creation
No input/output capture

All of the above is handled by the AuthBridge ext_proc (kagenti/kagenti#668).

Issue: Design least-invasive OTEL GenAI observability for agents (zero/minimal code changes) kagenti#667
AuthBridge ext_proc PR: feat: Approach A - AuthBridge ext_proc root span for zero-agent OTEL observability kagenti#668
Baseline (full instrumentation): PR ✨ Add OpenTelemetry GenAI auto-instrumentation for MLflow compatibility #114

Test plan

Deploy on HyperShift cluster with AuthBridge ext_proc
Run MLflow E2E tests with AGENT_OBSERVABILITY_VARIANT=authbridge
Verify traces appear in MLflow with correct root span attributes

🤖 Generated with Claude Code

Remove all OpenTelemetry code, auto-instrumentation, and tracing middleware from the weather agent. All observability is now handled externally by the AuthBridge ext_proc sidecar which: - Creates root spans from A2A request/response - Creates nested child spans from SSE stream events - Sets all MLflow/OpenInference/GenAI attributes Removed: - observability.py (TracerProvider, middleware, span helpers) - OTEL dependencies (opentelemetry-*, openinference-*) - Tracing middleware from Starlette app - All span creation/enrichment in agent code The agent is now a plain A2A agent with zero observability overhead. Refs kagenti/kagenti#667 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Wrap execute() with asyncio.shield() so LangGraph continues running even when the SSE client disconnects. Event emission errors from the closed connection are caught silently — the agent completes the task and stores the result in the task store regardless. This enables the ext_proc to recover the full trace via tasks/resubscribe after the original client disconnects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Add _cancelled flag to distinguish between SSE disconnect (continue execution) and explicit tasks/cancel (propagate cancel to task). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

When the final event emission fails due to client disconnect, save the completed task with artifacts directly to the InMemoryTaskStore. This allows the ext_proc's tasks/resubscribe to find the completed result and capture the output for the trace. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Serialize LangChain messages via model_dump() and json.dumps() instead of Python str(). This produces valid JSON that the ext_proc can parse to extract GenAI semantic convention attributes (token counts, model name, tool names) without regex. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

The emit_event(final=True) enqueues to EventQueue successfully even after SSE disconnect, but the consumer is gone so the task store never gets the artifact. Save the completed task with artifact directly to the store after every execution, not just on failure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Artifact constructor requires artifactId field. Generate a UUID. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Support three instrumentation modes via OTEL_INSTRUMENT env var: - none: no agent-side instrumentation (ext_proc sidecar only) - openinference: LangChainInstrumentor (OpenInference conventions) - openai: OpenAIInstrumentor (gen_ai.* conventions) Auto-instrumented spans become children of the ext_proc root span via W3C traceparent propagation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Without ASGI middleware, the LangChain/OpenAI auto-instrumentors create orphaned traces (separate from the ext_proc root span). Add OpenTelemetryMiddleware that extracts the traceparent header injected by the ext_proc sidecar and sets it as the active trace context. Auto-instrumented spans now become proper children of the ext_proc root span. The ASGI middleware is only enabled when OTEL_INSTRUMENT != none. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

pdettori

Big-picture review

The observability simplification itself looks good — moving root span ownership to the ext_proc sidecar is the right direction. Four concerns to address before merging, details in inline comments.

pdettori · 2026-02-25T02:45:59Z

a2a/weather_service/src/weather_service/agent.py

+
 from a2a.server.agent_execution import AgentExecutor, RequestContext
 from a2a.server.apps import A2AStarletteApplication
 from a2a.server.events.event_queue import EventQueue


🟡 PR description should reflect all changes in this PR

Beyond the observability simplification, this PR also introduces:

asyncio.shield() to survive SSE client disconnects (new resilience pattern)

Direct task-store saving as a fallback when SSE consumer is gone (new data-flow path)

_serialize_event() for JSON-serialized LangGraph events (new wire format for ext_proc parsing)

A working cancel() implementation (previously raise Exception)

These are all fine to include here, but the PR summary should list them so reviewers (and future git log readers) know what shipped.

pdettori · 2026-02-25T02:45:59Z

a2a/weather_service/src/weather_service/agent.py

@@ -91,7 +111,36 @@ class WeatherExecutor(AgentExecutor):
    """
    A class to handle weather assistant execution for A2A Agent.


🔴 Concurrency bug: _cancelled is shared across concurrent requests

WeatherExecutor is instantiated once in run() and shared across all concurrent requests. The _cancelled flag is reset to False at the top of every execute() call:

self._cancelled = False task = asyncio.ensure_future(self._do_execute(context, event_queue))

If two requests are in flight:

Request A is executing, user sends tasks/cancel → self._cancelled = True

Request B starts execute() → self._cancelled = False (clobbers A's cancel)

Request A's SSE disconnects → CancelledError is caught, but _cancelled is now False, so the cancel is treated as an SSE disconnect instead of an explicit cancel

This needs to be per-request state, e.g. a dict[str, bool] keyed by task ID or context ID.

pdettori · 2026-02-25T02:45:59Z

a2a/weather_service/src/weather_service/__init__.py

-
-# Initialize observability before importing agent
-setup_observability()
+All tracing and observability is handled externally by the AuthBridge


🟡 Docstring contradicts pyproject.toml

This says "No OTEL dependencies needed in the agent" but pyproject.toml adds five OTEL packages (opentelemetry-sdk, opentelemetry-exporter-otlp-proto-http, openinference-instrumentation-langchain, opentelemetry-instrumentation-openai, opentelemetry-instrumentation-asgi) and observability.py is still ~110 lines of active setup code.

This is "minimal" instrumentation, not "zero". The docstring (and PR title) should be corrected to avoid confusion when comparing Approach A vs. alternatives.

pdettori · 2026-02-25T02:45:59Z

a2a/weather_service/src/weather_service/agent.py

+    async def _do_execute(self, context: RequestContext, event_queue: EventQueue):
        """
        The agent allows to retrieve weather info through a natural language conversational interface
        """


🟡 Silent except Exception: pass hides real bugs

This pattern appears in several places in this PR (here, line 165, line 184). While the intent — survive SSE client disconnects — is valid, swallowing all exceptions silently (including serialization bugs, type errors, SDK contract violations) will make production debugging very difficult.

At minimum, log at warning level instead of pass or debug, so these are visible in production logs without enabling debug verbosity.

Ladas force-pushed the feat/otel-authbridge-minimal-agent-667 branch from 4c05994 to e9642b2 Compare February 17, 2026 12:27

Ladas and others added 7 commits February 17, 2026 14:10

chore: Regenerate uv.lock after removing OTEL dependencies

c4a0bb2

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

fix: Add required artifactId when saving task to store

0f34c82

Artifact constructor requires artifactId field. Generate a UUID. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Ladas mentioned this pull request Feb 17, 2026

feat: Add OTEL-enhanced ext_proc for zero-agent GenAI observability kagenti/kagenti-extensions#119

Open

4 tasks

Ladas and others added 2 commits February 18, 2026 09:08

pdettori reviewed Feb 25, 2026

View reviewed changes

pdettori added this to Kagenti Issue Prioritization Mar 4, 2026

github-project-automation bot moved this to Backlog in Kagenti Issue Prioritization Mar 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Minimal agent for AuthBridge OTEL (Approach A, zero custom observability)#122

feat: Minimal agent for AuthBridge OTEL (Approach A, zero custom observability)#122
Ladas wants to merge 10 commits intokagenti:mainfrom
Ladas:feat/otel-authbridge-minimal-agent-667

Ladas commented Feb 13, 2026

Uh oh!

pdettori left a comment

Uh oh!

pdettori Feb 25, 2026

Uh oh!

pdettori Feb 25, 2026

Uh oh!

pdettori Feb 25, 2026

Uh oh!

pdettori Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -91,7 +111,36 @@ class WeatherExecutor(AgentExecutor):
		"""
		A class to handle weather assistant execution for A2A Agent.

Conversation

Ladas commented Feb 13, 2026

Summary

What the agent does NOT have (by design)

Related

Test plan

Uh oh!

pdettori left a comment

Choose a reason for hiding this comment

Big-picture review

Uh oh!

pdettori Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

pdettori Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

pdettori Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

pdettori Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants