Skip to content

feat: align tracing attributes with .NET SDK conventions#126

Open
torosent wants to merge 14 commits intomainfrom
torosent/tracing-alignment
Open

feat: align tracing attributes with .NET SDK conventions#126
torosent wants to merge 14 commits intomainfrom
torosent/tracing-alignment

Conversation

@torosent
Copy link
Member

@torosent torosent commented Mar 3, 2026

Summary

What changed?

  • Aligned JS SDK tracing attributes with .NET SDK conventions: added execution_id on creation spans, version on activity execution spans, name/instance_id on timer spans, and durabletask.task.status on orchestration completion
  • Implemented retroactive span emission model matching .NET/Java SDKs: emit Client spans at activity/sub-orchestration completion with historical scheduling timestamps, and timer spans with creation-to-fired duration
  • Fixed distributed-tracing sample OTel imports for compatibility
  • Added Jaeger screenshots demonstrating the tracing improvements

Why is this change needed?

The JS SDK's tracing was missing several attributes present in the .NET SDK and used a proactive-only span emission model (point-in-time spans at scheduling). Both .NET and Java SDKs emit retroactive spans at completion time with historical timestamps, providing accurate scheduling-to-completion duration visibility in trace tools like Jaeger. This PR closes the gap.

Issues / work items


Project checklist

  • Release notes are not required for the next release
    • Otherwise: Notes added to CHANGELOG.md
  • Backport is not required
  • All required tests have been added/updated (unit tests, E2E tests)
  • Breaking change?
    • No. All changes are additive (new attributes, new spans). Existing behavior is preserved.

AI-assisted code disclosure (required)

Was an AI tool used? (select one)

  • No
  • Yes, AI helped write parts of this PR (e.g., GitHub Copilot)
  • Yes, an AI agent generated most of this PR

If AI was used:

  • Tool(s): GitHub Copilot CLI Agent (Claude Opus 4.6)
  • AI-assisted areas/files:
    • packages/durabletask-js/src/tracing/trace-helper.ts â�� new functions: emitRetroactiveActivityClientSpan, emitRetroactiveSubOrchClientSpan, processNewEventsForTracing, setOrchestrationStatusFromActions, orchestrationStatusToString; updated emitSpanForTimer with startTime parameter
    • packages/durabletask-js/src/tracing/index.ts â�� new exports
    • packages/durabletask-js/src/worker/task-hub-grpc-worker.ts â�� integrated processNewEventsForTracing and setOrchestrationStatusFromActions
    • packages/durabletask-js/test/tracing.spec.ts â�� 27 new unit tests
    • examples/azure-managed/distributed-tracing/index.ts â�� fixed OTel Resource import compatibility
    • CHANGELOG.md â�� updated
  • What you changed after AI output: Reviewed all code, verified attribute names match .NET Schema.cs constants, verified proto API usage, confirmed test assertions

AI verification (required if AI was used):

  • I understand the code and can explain it
  • I verified referenced APIs/types exist and are correct
  • I reviewed edge cases/failure paths (timeouts, retries, cancellation, exceptions)
  • I reviewed concurrency/async behavior
  • I checked for unintended breaking or behavior changes

Testing

Automated tests

  • Result: Passed â�� 807 tests (743 core + 64 azure-managed), including 27 new tracing tests

Manual validation (only if runtime/behavior changed)

  • Environment: macOS (arm64), Node.js 25.7.0, DTS emulator (Docker), Jaeger (Docker)
  • Steps + observed results:
    1. Started DTS emulator + Jaeger via docker compose up -d
    2. Ran examples/azure-managed/distributed-tracing sample � both orchestrations completed successfully
    3. Verified traces in Jaeger: 40 spans for dataPipelineOrchestrator (up from 29), 14 spans for sequenceOrchestrator (up from 11)
    4. Confirmed retroactive Client spans have instance_id attribute and historical start times
    5. Confirmed durabletask.task.status=Completed on orchestration spans
  • Evidence: Screenshots below

Details

1. Attribute Alignment

Span Type New Attribute Value
create_orchestration durabletask.task.execution_id Execution ID from request
activity (Server) durabletask.task.version Activity version (also in span name)
timer (Internal) durabletask.task.name Orchestration name
timer (Internal) durabletask.task.instance_id Instance ID
orchestration (Server) durabletask.task.status "Completed", "Failed", "Terminated", etc.

2. Retroactive Span Emission Model

Previously, the JS SDK only emitted activity/sub-orchestration Client spans proactively at scheduling time (point-in-time spans with zero duration). The .NET and Java SDKs emit these retroactively at completion time with historical scheduling timestamps.

This PR adds retroactive span emission matching the .NET pattern (EmitTraceActivityForTaskCompleted/Failed, EmitTraceActivityForTimer):

Retroactive Span Trigger Start Time Source
activity:{name} (Client) TaskCompleted/TaskFailed TaskScheduled event timestamp
orchestration:{name} (Client) SubOrchCompleted/SubOrchFailed SubOrchCreated event timestamp
orchestration:{name}:timer (Internal) TimerFired TimerCreated event timestamp

Proactive Client spans are preserved for trace context injection (Server span parents). Retroactive spans add timeline/duration coverage.

Span Types Summary

Span Kind Name Format Key Attributes
Create Orchestration Producer create_orchestration:{name}[@({version})] type, name, instance_id, execution_id, version
Orchestration Execution Server orchestration:{name}[@({version})] type, name, instance_id, status
Activity Scheduling (proactive) Client activity:{name}[@({version})] type, name, task_id
Activity Completion (retroactive) Client activity:{name}[@({version})] type, name, instance_id, task_id, startTime=historical
Activity Execution Server activity:{name}[@({version})] type, name, instance_id, task_id, version
Timer (retroactive) Internal orchestration:{orchName}:timer type, name, instance_id, fire_at, startTime=historical
Event (from worker) Producer orchestration_event:{eventName} type, name, target_instance_id

Screenshots (Jaeger)

Jaeger — Trace search showing FanOutFanIn trace (25 spans):

Jaeger trace list

Jaeger — Full trace detail with proper span durations (timer, parallel activities, aggregation):

Jaeger trace detail

Jaeger — Span detail showing attributes (aligned with .NET SDK schema):

Jaeger span detail


Notes for reviewers

  • The retroactive spans are emitted before the orchestrator executor runs (in _executeOrchestratorInternal), matching the .NET worker pattern
  • JS OTel API cannot override span IDs (unlike .NET's SetSpanId reflection hack), so retroactive Client spans get new span IDs rather than matching the original scheduling context â�� this is a known platform limitation shared with Java
  • Replay span identity: Platform limitation â�� JS OTel API cannot override span IDs like .NET's SetSpanId reflection hack. JS stores the original span ID as durabletask.task.replay_span_id attribute for cross-replay correlation instead. This is not a fixable gap.

torosent and others added 4 commits March 3, 2026 15:28
- Add execution_id attribute on orchestration creation spans
- Add version attribute on activity execution spans (name + version)
- Add name and instance_id attributes on timer spans
- Add durabletask.task.status attribute on orchestration completion
- Pass instanceId through processActionsForTracing for timer enrichment
- Add setOrchestrationStatusFromActions helper with status string mapping
- Add 13 new unit tests covering all attribute additions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 3, 2026 23:29
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns the JS SDK’s OpenTelemetry spans/attributes with DurableTask .NET conventions to improve cross-SDK trace parity and observability.

Changes:

  • Add new tracing attributes: orchestration execution_id, activity version, timer task.name/task.instance_id, and orchestration completion durabletask.task.status.
  • Wire orchestration instanceId into tracing action processing so timer spans can be enriched.
  • Extend unit tests and update the Azure-managed distributed tracing sample + changelog.

Reviewed changes

Copilot reviewed 6 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/durabletask-js/src/tracing/trace-helper.ts Adds execution_id, activity version, timer enrichment, and orchestration status helper; plumbs instanceId into action tracing.
packages/durabletask-js/src/tracing/index.ts Re-exports the new orchestration status helper.
packages/durabletask-js/src/worker/task-hub-grpc-worker.ts Passes instanceId to tracing action processing and sets orchestration completion status on spans.
packages/durabletask-js/test/tracing.spec.ts Adds unit tests covering the new attributes and status mapping.
examples/azure-managed/distributed-tracing/index.ts Updates sample OTel bootstrap/resource initialization and span processor wiring.
CHANGELOG.md Adds an “Upcoming” entry for the tracing alignment.
doc/images/tracing/jaeger-trace-list.png Adds/updates tracing screenshot asset.

torosent and others added 8 commits March 3, 2026 15:40
…etion

Implement the retroactive span emission pattern matching the .NET SDK's
EmitTraceActivityForTaskCompleted/Failed and EmitTraceActivityForTimer:

- emitRetroactiveActivityClientSpan(): Creates Client spans at activity
  completion/failure time with historical startTime from TaskScheduled event
- emitRetroactiveSubOrchClientSpan(): Same for sub-orchestration completions
- emitSpanForTimer(): Now accepts optional startTime parameter for creation-
  to-fired duration coverage
- processNewEventsForTracing(): Pre-processes new history events (before
  orchestrator executor runs) to emit retroactive spans, matching .NET's
  worker-level tracing pattern

This addresses the architectural gap where JS emitted scheduling spans only
at scheduling time (proactive), while .NET and Java emit retroactive spans
at completion time with accurate scheduling-to-completion duration.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…map lookup

- Extract common emitRetroactiveClientSpan() helper to eliminate duplication
  between emitRetroactiveActivityClientSpan and emitRetroactiveSubOrchClientSpan
- Replace orchestrationStatusToString switch/case with object lookup map
- Hoist duplicate orchName computation to shared scope in worker

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Set durabletask.task.status in failure path too (not just success path)
- Extract executionId to local variable instead of double getExecutionid() call
- Use traceExporter option directly instead of spanProcessors with 'as any' cast
- Add PR link (#126) to CHANGELOG entries per repo convention

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… spans

- Remove timer span emission from processActionsForTracing (timer spans are
  now emitted only retroactively from TimerFired events via
  processNewEventsForTracing, matching .NET/Java behavior)
- Add instance_id attribute to event spans from worker (emitSpanForEventSent),
  matching .NET's StartTraceActivityForEventRaisedFromWorker
- Update tests to verify timer spans are no longer created proactively

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n_id

- setOrchestrationStatusFromActions now sets ERROR span status when
  orchestration completes with FAILED (matching .NET behavior where span
  gets ActivityStatusCode.Error with result message). Previously JS always
  set OK on success path even when executor reported FAILED.
- Removed separate setSpanOk call; status is now fully determined by the
  completion action status.
- Added execution_id attribute to event spans from worker, matching .NET's
  StartTraceActivityForEventRaisedFromWorker.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add 2-second timer to sequenceOrchestrator to showcase retroactive timer spans
- Capture fresh screenshots showing timer span with 2.01s duration and all
  attributes (fire_at, instance_id, name, task_id, type=timer)
- Remove old retroactive-client-span screenshot, add timer-span screenshot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…screenshots

Rewrote the distributed-tracing sample to match the Java SDK's tracing
sample (FanOutFanIn): 1s timer → 5× parallel GetWeather → CreateSummary.
This produces a trace structure directly comparable to the Java PR screenshots.

Updated Jaeger screenshots to match the Java PR pattern:
- jaeger-trace-list.png: Trace search showing FanOutFanIn trace (25 spans)
- jaeger-full-trace-detail.png: Full trace detail with span hierarchy
- jaeger-span-detail.png: Span detail showing attributes (aligned with .NET)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
torosent and others added 2 commits March 3, 2026 18:08
…ations

Address review feedback from @YunchuWang: processActionsForTracing was
creating CLIENT spans at scheduling time, AND processNewEventsForTracing
was creating retroactive CLIENT spans at completion time — resulting in
duplicate CLIENT spans per activity/sub-orchestration.

.NET never creates actual CLIENT spans for scheduling. Instead, it generates
a random span ID (ActivitySpanId.CreateRandom()) and constructs a TraceContext
directly, without creating a span.

Changed to match .NET exactly:
- Replaced startSpanForSchedulingTask() with injectTraceContextForSchedulingTask()
  which generates a random span ID and injects trace context without creating a span
- Same for startSpanForSchedulingSubOrchestration() → injectTraceContextForSchedulingSubOrchestration()
- Only the retroactive CLIENT spans (from processNewEventsForTracing) now exist

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address review feedback from @YunchuWang: in the catch block, setSpanError
sets the span with the actual exception message, but setOrchestrationStatusFromActions
would then overwrite it with a generic 'Orchestration failed' because the
synthesized action has undefined result.

Now the catch path sets the status attribute directly without calling
setOrchestrationStatusFromActions, preserving the specific error message.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants