Skip to content

fix(tui): prevent false 'task crashed' errors due to race condition#293

Merged
echobt merged 1 commit intomasterfrom
fix/subagent-crash-detection-race-condition
Jan 27, 2026
Merged

fix(tui): prevent false 'task crashed' errors due to race condition#293
echobt merged 1 commit intomasterfrom
fix/subagent-crash-detection-race-condition

Conversation

@echobt
Copy link
Contributor

@echobt echobt commented Jan 27, 2026

Problem

The check_crashed_tasks() function was incorrectly detecting normal task completions as crashes. Users were seeing error messages like:

Subagent task terminated unexpectedly (possible panic or cancellation)

This happened for normal tools like ListSubagents, Glob, LS, etc.

Root Cause

There was a race condition in the crash detection logic:

  1. A task sends ToolEvent::Completed via the channel
  2. The task terminates (is_finished() == true)
  3. Before the event is processed by the event loop, check_crashed_tasks() runs on a tick
  4. It sees is_finished() == true and incorrectly assumes it's a crash
  5. It sends ANOTHER ToolEvent::Failed

The original code assumed that if a task was finished but still in running_tool_tasks, it must have crashed. But due to async event processing, there's a window where the task has finished and sent its event, but the event hasn't been processed yet.

Solution

The fix properly awaits the JoinHandle to check if the task actually panicked or was cancelled:

  • If handle.await returns Ok(()), the task completed normally and its event is pending in the channel - we do NOT send a Failed event
  • If handle.await returns Err(JoinError), the task actually crashed (panic or cancellation) - we send a Failed event

Testing

  • cargo check -p cortex-tui
  • cargo fmt
  • Existing tests pass (97 warnings but no new issues)

The check_crashed_tasks() function was incorrectly detecting normal task
completions as crashes. This happened because:

1. A task sends ToolEvent::Completed via the channel
2. The task terminates (is_finished() == true)
3. Before the event is processed, check_crashed_tasks() runs
4. It sees is_finished() == true and assumes it's a crash
5. It sends ANOTHER ToolEvent::Failed

The fix properly awaits the JoinHandle to check if the task actually
panicked or was cancelled. If handle.await returns Ok(()), we know the
task completed normally and its event is just pending in the channel.

This fixes the 'Subagent task terminated unexpectedly (possible panic or
cancellation)' error that appeared for tools like ListSubagents, Glob, LS.
@echobt echobt merged commit 736aeb2 into master Jan 27, 2026
2 of 3 checks passed
@echobt echobt deleted the fix/subagent-crash-detection-race-condition branch January 27, 2026 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants