-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Bug: WhenAllTask crashes when children complete after fail-fast
Branch with fix: copilot-finds/bug/fix-whenall-child-completion-after-fail-fast
A PR could not be created automatically because the repository settings do not allow GitHub Actions to create pull requests. Please create a PR from the branch above.
Problem
WhenAllTask.onChildCompleted() has two bugs that can crash orchestrations using the fan-out/fan-in pattern:
Bug 1: Throw after fail-fast
When a WhenAllTask fails fast on the first child failure, it sets _isComplete = true. If any remaining child task later completes (in the same or subsequent event batch), onChildCompleted() throws "Task is already completed". This exception propagates through the executor and crashes the orchestration with an unexpected error — or worse, overwrites a correctly completed orchestration status when the orchestrator caught the WhenAll failure.
File: packages/durabletask-js/src/task/when-all-task.ts:29-31
Bug 2: Fall-through on last-task-fails
When the failing task happens to be the last child to complete, the fail-fast block sets _isComplete = true but falls through to the result-collection block (this._tasks.map(t => t.getResult())). Since getResult() throws when _exception is set, this crashes the orchestration through an unintended code path, bypassing the generator entirely.
File: packages/durabletask-js/src/task/when-all-task.ts:35-43
Root Cause
- The
onChildCompletedguard usedthrowinstead ofreturnfor already-complete tasks.WhenAnyTaskcorrectly usesif (!this.isComplete)—WhenAllTaskwas inconsistent. - Missing
returnafter the fail-fast block allowed fall-through to the result-collection code that callsgetResult()on all tasks, including the failed one.
Fix (implemented on branch)
WhenAllTask.onChildCompleted() (2 changes)
- Replace
throw new Error("Task is already completed")withreturnto silently ignore subsequent child completions after fail-fast - Add
returnafter the fail-fast exception assignment to prevent fall-through togetResult()
RuntimeOrchestrationContext.resume() (1 change)
- Add early return guard when
_isCompleteis true, preventing attempts to resume a finished generator when subsequent task completion events triggerresume()on an already-completed context
Testing (3 new tests, all passing)
- "should fail whenAll correctly when the failing task is the last to complete" — Verifies Bug 2
- "should not crash when additional tasks complete after whenAll fails fast" — Verifies Bug 1
- "should preserve orchestration result when whenAll failure is caught and other tasks complete" — Verifies the catch-and-handle scenario
All 780 tests pass. Lint clean.
Risk
Low risk. The fix only changes behavior for the already-complete case (which previously threw) and the fall-through case (which was always incorrect). WhenAnyTask already uses the return pattern.