Skip to content

fix: sandbox probe diagnostics and proc-exit fast-fail (#762)#794

Open
ZeusCraft10 wants to merge 1 commit intodifferent-ai:devfrom
ZeusCraft10:fix/762-sandbox-probe-connection-refused
Open

fix: sandbox probe diagnostics and proc-exit fast-fail (#762)#794
ZeusCraft10 wants to merge 1 commit intodifferent-ai:devfrom
ZeusCraft10:fix/762-sandbox-probe-connection-refused

Conversation

@ZeusCraft10
Copy link

Summary

  • Capture orchestrator sidecar stdout/stderr during detached startup and include last 20 lines in failure output.
  • Detect early process exit and fail fast instead of waiting the full 90s timeout.
  • Add structured stage logs (spawn, first-health-attempt, proc-exit-early, timeout) with orchestratorOutput in progress event payloads.

Why

  • The sandbox probe currently discards the CommandChild and event receiver from spawn(), losing all sidecar output.
  • Timeout errors are opaque: Connection refused (os error 61) with no context about whether the process crashed, failed to bind, or hit a readiness issue.
  • When the orchestrator exits early, the probe still waits the full 90 seconds before reporting failure.

Issue

Scope

  • packages/desktop/src-tauri/src/commands/orchestrator.rsorchestrator_start_detached() function and new snapshot_proc_output() helper.

Out of scope

  • Root cause fix for why the orchestrator fails to bind on certain macOS Docker variants (OrbStack vs Docker Desktop). This PR provides the diagnostics to identify that root cause.
  • Frontend changes to surface the new orchestratorOutput field from progress events.
  • Version skew detection between app/orchestrator/server.

Testing

Ran

  • rustfmt --edition 2021 --check on the modified file — passes clean.
  • cargo check could not complete due to pre-existing Windows linker environment issue (OneDrive path with spaces breaks link.exe). Not caused by this PR — affects all Rust compilation on this machine.

Result

  • pass: rustfmt formatting check
  • external/env/auth blockers: cargo check / cargo test blocked by Windows link.exe path resolution conflict (pre-existing)

CI status

  • pass: pending CI run
  • code-related failures: none expected
  • external/env/auth blockers: local cargo check blocked by Windows linker issue (not PR-related)

Manual verification

  1. Trigger a sandbox probe where the orchestrator process crashes before health becomes ready — error should now include the orchestrator's last output lines and return in seconds, not 90s.
  2. Trigger a sandbox probe that times out (e.g. blocked port) — error should now include Orchestrator output (last lines): section and stage: "timeout" in the progress event.
  3. Trigger a successful sandbox probe — behavior should be unchanged, health endpoint reached normally.

Evidence

  • N/A — backend diagnostics change, no UI changes. Requires macOS with Docker to reproduce the original failure scenario.

Risk

  • Low. Changes are additive (new logs, new fields in event payloads). The only behavioral change is capturing (rx, child) from spawn() instead of discarding — the CommandChild is now held alive inside the async drain task, which is consistent with how all other sidecar spawns in the codebase work (see commands/engine.rs, orchestrator/mod.rs).

Rollback

  • Revert this single commit. No migrations, no schema changes, no dependency additions.

…ifferent-ai#762)

The detached orchestrator startup discarded the CommandChild and event
receiver from spawn(), losing all sidecar output and making timeout
errors opaque ("Connection refused" with no context).

- Capture (rx, child) from spawn and drain stdout/stderr into a ring
  buffer via an async task.
- Detect early process exit via AtomicBool and fail fast with the
  last orchestrator output instead of waiting the full 90s timeout.
- Add structured stage logs: spawn (port/bind info), first-health-
  attempt, proc-exit-early, and timeout with orchestratorOutput in
  the emitted progress event payload.
- Add unit tests for snapshot_proc_output helper.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Mar 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
openwork-software Ready Ready Preview, Comment Mar 8, 2026 5:35am

@vercel
Copy link

vercel bot commented Mar 8, 2026

@ZeusCraft10 is attempting to deploy a commit to the 0 Finance Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 8, 2026

The following comment was made by an LLM, it may be inaccurate:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Sandbox probe fails with localhost health connection refused even when Docker doctor is healthy

1 participant