Skip to content

fix: prevent sandbox Docker preflight from hanging indefinitely#789

Open
caoergou wants to merge 1 commit intodifferent-ai:devfrom
caoergou:fix/sandbox-timeout-blocking-757
Open

fix: prevent sandbox Docker preflight from hanging indefinitely#789
caoergou wants to merge 1 commit intodifferent-ai:devfrom
caoergou:fix/sandbox-timeout-blocking-757

Conversation

@caoergou
Copy link

@caoergou caoergou commented Mar 7, 2026

Summary

Fixes #757

The run_local_command_with_timeout function could block indefinitely when joining pipe reader threads after killing a timed-out child process. This caused the UI to freeze on "Checking Docker..." during sandbox creation.

Problem

When Docker preflight takes too long or stalls:

  1. run_local_command_with_timeout spawns stdout/stderr reader threads
  2. On timeout, it kills the child process and waits for it
  3. Then it unconditionally joins the reader threads with join()
  4. If pipe readers never complete, join() can block forever
  5. This keeps sandbox_doctor() from returning, causing UI deadlock

Solution

  • Replace unbounded join() with bounded recv_timeout() using mpsc channels
  • Add READER_JOIN_TIMEOUT_MS (2 seconds) as upper bound for thread cleanup
  • Handle Timeout and Disconnected cases explicitly with telemetry logs
  • Drop thread handles after bounded wait to avoid blocking

Changes

File Change
commands/orchestrator.rs Add channel-based timeout helper with bounded join

Testing

  • Code compiles and passes rustfmt --check
  • Manual testing requires Docker environment with intentional stall

Acceptance Criteria

  • Repro with intentionally stalled Docker probe does not leave modal stuck
  • refreshSandboxDoctor() returns within a bounded time in all failure paths
  • User sees deterministic outcome, not indefinite wait

…erent-ai#757)

The `run_local_command_with_timeout` function could block indefinitely
when joining pipe reader threads after killing a timed-out child process.
This caused the UI to freeze on "Checking Docker..." during sandbox creation.

Changes:
- Replace unbounded `join()` with bounded `recv_timeout()` using channels
- Add 2-second upper bound for pipe reader thread cleanup
- Add telemetry logs around timeout/kill/join operations
- Handle timeout and disconnect cases explicitly

The fix ensures that `sandbox_doctor()` returns within a bounded time in
all failure paths, preventing UI deadlocks.
@vercel
Copy link

vercel bot commented Mar 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
openwork-software Ready Ready Preview, Comment Mar 7, 2026 9:39am

@vercel
Copy link

vercel bot commented Mar 7, 2026

@caoergou is attempting to deploy a commit to the 0 Finance Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2026

The following comment was made by an LLM, it may be inaccurate:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Sandbox create can hang indefinitely in Docker preflight

1 participant