-
Notifications
You must be signed in to change notification settings - Fork 245
Description
Summary
The loop-streaming orchestrator emits the same response text through two redundant pipelines, causing downstream consumers that read both channels to accumulate duplicated content.
Reproduction
When extended thinking with interleaved content is enabled (interleaved-thinking-2025-05-14), the orchestrator:
- Phase 1 (events): Iterates content blocks and emits
content_block:start+content_block:endwith full block text - Phase 2 (yield): Re-yields the same
response.texttoken-by-token ascontent_block:deltaevents
Frontend consumers that handle both content_block:end (writing full text to message.content) and delta events (appending tokens to message.content) end up with message.content longer than the actual response. In the observed case: 2,009 chars of actual content inflated to 2,121 chars in message.content.
Impact
The Kepler desktop frontend has a liveTextTail guard in ChatMessage.tsx that detects the mismatch between contentParts text and message.content length, then creates a phantom tail fragment. The user sees only the last ~112 characters of a full response (starting mid-word), with the rest hidden behind tool call timeline entries.
Observed: User asked for dashboard KPI recommendations. Full response was 2,009 chars with tables and analysis. User saw only: "d I'll start there. Also happy to hear if you have other specific KPIs or filters in mind that I haven't listed."
Expected Behavior
Each content block should be delivered through one channel, not both:
- Streaming mode: Yield tokens (with
content_block:deltafor observers), then emitcontent_block:endas a finalization signal (with metadata, not redundant content delivery) - Non-streaming fallback: Emit
content_block:start+content_block:endwith full content
Architectural Note
Per the amplifier-core design philosophy ("Explicit > implicit", "No hidden state"), consumers should not need to deduplicate content from the orchestrator. The orchestrator owns delivery policy and should own delivery correctness.
Workaround
Kepler desktop is applying a temporary frontend guard: only create liveTextTail during active streaming (message.isStreaming === true), not for finalized messages.
Environment
- amplifier-module-loop-streaming (latest via git)
- amplifier-module-provider-anthropic with interleaved thinking enabled
- Kepler desktop sidecar (amplifier-distro-kepler)