Agents genai autoinstrument by Ladas · Pull Request #13 · kagenti/agentic-control-plane

Ladas · 2025-12-10T13:48:16Z

WIP: testing genAI autoinstrumentation with phoenix

- Add observability.py to k8s_debug_agent, source_code_analyzer with OpenInference instrumentation - Add orchestrator_agent for routing tasks to specialized agents via A2A - Add get_namespaces MCP tool to k8s-readonly-server - Fix AutoGen async trace context propagation with global context store - Fix A2A task lifecycle bug preventing double-completion - Fix RBAC for k8s-readonly-server and a2a-bridge ServiceAccounts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Replace OpenInference instrumentation with OTEL GenAI (opentelemetry-instrumentation-openai-v2) - Add OpenInferenceSpanProcessor for Phoenix compatibility - Remove manual async context patching (~300 lines removed) - Simplify observability.py for all agents: - k8s-debug-agent - orchestrator-agent - source-code-analyzer The new approach uses standard OTEL GenAI semantic conventions (gen_ai.*) with OpenInferenceSpanProcessor to convert to OpenInference format for Phoenix. This removes the need for complex manual patching and properly handles async context. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

The package only has beta versions (2.0b0, 2.1b0, 2.2b0), so we need to use >=2.0b0 instead of >=2.0.0 to allow pip to resolve the dependency. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

The opentelemetry-instrumentation-openai-v2 package only has beta versions (2.0b0, 2.1b0, 2.2b0). Pip requires --pre flag to install pre-release packages. - k8s_debug_agent: Add --pre to pip install - orchestrator_agent: Add --pre to pip install - source_code_analyzer: Add [tool.uv] prerelease = "allow" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

The opentelemetry-instrumentation-openai-v2 package uses module path 'opentelemetry.instrumentation.openai_v2' not 'opentelemetry.instrumentation.openai'. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add opentelemetry-instrumentation-asyncio dependency - Instrument asyncio to propagate context across async tasks - This ensures OpenAI spans are children of agent spans 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Add explicit spans around AutoGen agent invocations to ensure OpenAI auto-instrumentation spans have a proper parent. This fixes the issue where LLM spans were disconnected from the agent trace hierarchy. Changes: - main.py: Wrap _invoke_agent calls with tracer.start_as_current_span() - observability.py: Remove unused OpenInference span processor - pyproject.toml: Remove openinference-instrumentation-openllmetry dep The GenAI to OpenInference conversion is now handled at the OTEL Collector level using the transform/genai_to_openinference processor. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

The fallback from openinference.instrumentation.using_attributes to contextlib.nullcontext caused TypeError when keyword arguments were passed. nullcontext.__init__() doesn't accept keyword arguments. This fix creates a proper no-op context manager that accepts and ignores keyword arguments, allowing trace context setup to work correctly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

AutoGen crosses async/sync boundaries (ThreadPoolExecutor), which breaks OpenTelemetry contextvars propagation. This caused LLM spans to be disconnected from the agent span hierarchy. The httpx instrumentation solves this by capturing context at the HTTP transport layer where it IS preserved (OpenAI SDK uses httpx internally). This ensures LLM spans become proper children of the agent span. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

The opentelemetry-instrumentation-openai-v2 creates LLM spans that break context propagation when AutoGen crosses async/sync boundaries (ThreadPoolExecutor). This resulted in orphaned LLM spans with parent=null in separate traces. The httpx instrumentation solves this because: 1. OpenAI SDK uses httpx internally for all API calls 2. httpx captures context at HTTP layer where it IS preserved 3. All LLM calls are captured with proper parent relationships This approach: - Removes opentelemetry-instrumentation-openai-v2 dependency - Relies solely on httpx instrumentation for LLM call tracing - OTEL Collector can enrich HTTP spans with LLM attributes if needed 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Ladas and others added 3 commits December 10, 2025 11:02

Ladas marked this pull request as draft December 10, 2025 13:48

Ladas and others added 7 commits December 10, 2025 16:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agents genai autoinstrument#13

Agents genai autoinstrument#13
Ladas wants to merge 10 commits intokagenti:mainfrom
Ladas:agents_genai_autoinstrument

Ladas commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ladas commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant