feat(realtime): WebRTC observability - diagnostics, stats & telemetry#85
feat(realtime): WebRTC observability - diagnostics, stats & telemetry#85AdirAmsalem merged 11 commits intomainfrom
Conversation
Add comprehensive WebRTC observability to the realtime client: - Structured logger with configurable log levels (debug/info/warn/error) - Diagnostic event system for connection lifecycle (ICE, signaling, phase timing, reconnects, video stalls) - WebRTC stats collector polling at 1s intervals with delta computation for cumulative counters - Telemetry reporter that batches stats + diagnostics and sends to backend every 10s - NullReporter pattern to eliminate conditional checks when telemetry is disabled - Granular error codes (WEBRTC_NEGOTIATION_FAILED, ICE_CONNECTION_FAILED, etc.) - Explicit tags (session_id, sdk_version, integration) in telemetry reports for Datadog tagging - Quality limitation tracking from outbound-rtp stats - Video stall detection (fps < 0.5 threshold, Twilio pattern) - 121 unit tests passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The telemetry endpoint is always https://api.decart.ai/v1/telemetry. Remove telemetryUrl from RealTimeClientOptions and TelemetryReporterOptions. Also cleans up unused httpBaseUrl variable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve merge conflicts after merging origin/main: - client.ts: keep observability code, adopt main's refactored initial prompt flow - webrtc-connection.ts: combine imports, use main's initialImage with diagnostic timing - webrtc-manager.ts: combine imports from both branches - Fix telemetry URL in test to match production endpoint Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
RTCStatsReport.values() is not in TypeScript's DOM lib types. Use forEach() which is properly typed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
commit: |
…el propagation to telemetry reporter
- remove startStats from RealTimeClient public surface - keep internal auto stats collection for telemetry - stop exporting StatsOptions from package index
| // Auto-start stats when telemetry is enabled | ||
| if (opts.telemetryEnabled) { | ||
| startStatsCollection(); | ||
| } |
There was a problem hiding this comment.
Stats events never fire when telemetry is disabled
Medium Severity
The stats event is part of the public Events type, but startStatsCollection is only called when opts.telemetryEnabled is true. Both call sites — the auto-start at connection and the handleConnectionStateChange handler for reconnects — are gated behind telemetry checks. This means client.on("stats", ...) listeners never fire when telemetry is disabled, coupling local stats observation with remote telemetry reporting. A user who wants local WebRTC stats without sending data to Decart's servers has no way to get them.
Additional Locations (1)
| phase: "initial-prompt", | ||
| durationMs: performance.now() - promptStart, | ||
| success: true, | ||
| }); |
There was a problem hiding this comment.
Missing failure diagnostic events for avatar-image and prompt phases
Low Severity
The avatar-image and initial-prompt phases only emit phaseTiming diagnostic events on success. If setImageBase64 or sendInitialPrompt rejects (e.g., timeout, WebSocket close), the await throws and the emitDiagnostic call is skipped with no failure event emitted. This is inconsistent with the websocket and webrtc-handshake phases, which both emit phaseTiming with success: false on failure, making connection failure diagnosis incomplete for these phases.


Summary
src/utils/logger.ts) with configurable log levels andcreateConsoleLoggerhelpersrc/realtime/diagnostics.ts) for full connection lifecycle: ICE candidates, signaling state, peer connection state, phase timing, reconnects, video stallssrc/realtime/webrtc-stats.ts) polling at 1s with delta computation for cumulative counters (packetsLostDelta,framesDroppedDelta,freezeCountDelta,freezeDurationDelta)src/realtime/telemetry-reporter.ts) batching stats + diagnostics every 10s with explicit Datadog tags (session_id,sdk_version,integration)?.checks when telemetry is disabledWEBRTC_NEGOTIATION_FAILED,ICE_CONNECTION_FAILED,MEDIA_STREAM_FAILED,DATA_CHANNEL_FAILED)qualityLimitationReason)New files
packages/sdk/src/utils/logger.tspackages/sdk/src/realtime/diagnostics.tspackages/sdk/src/realtime/webrtc-stats.tspackages/sdk/src/realtime/telemetry-reporter.tsModified files
packages/sdk/src/index.ts— new exports, logger/telemetry wiringpackages/sdk/src/realtime/client.ts— telemetry reporter integration, stall detection, stats auto-startpackages/sdk/src/realtime/webrtc-manager.ts— diagnostic emitter threadingpackages/sdk/src/realtime/webrtc-connection.ts— diagnostic event emission for ICE/signaling/connection statespackages/sdk/src/realtime/subscribe-client.ts— diagnostic emitter supportpackages/sdk/src/utils/errors.ts— new WebRTC error codespackages/sdk/tests/unit.test.ts— 10 new tests (121 total)Test plan
npx vitest run packages/sdk/tests/unit.test.ts)RTCStatsReport.values()DOM type issues)telemetry: trueand verify telemetry reports reach the backendsdk.webrtc.*metrics with tags🤖 Generated with Claude Code
Note
Medium Risk
Touches realtime/WebRTC connection, reconnection, and error-handling paths and adds background telemetry uploads, which could affect connection stability/performance if misbehaving despite being opt-out.
Overview
Adds WebRTC observability to the SDK’s realtime client: new
diagnosticandstatsevents emit connection phase timings, ICE/signaling/peer-state changes, reconnect outcomes, selected candidate pair, and video-stall detection, with periodicgetStats()polling and delta/bitrate calculations.Introduces opt-out telemetry reporting (buffered until
session_id, chunked uploads with auth headers and keepalive on disconnect) plus a structuredLoggerAPI, wires both throughcreateDecartClient/realtime + subscribe flows, and replaces generic WebRTC errors with classified error codes; unit tests expanded to cover telemetry buffering, stats collection, logger filtering, and error classification.Written by Cursor Bugbot for commit 0920e6f. This will update automatically on new commits. Configure here.