Skip to content

fix: wait for entire process group before resolving exited promise#141

Merged
teng-lin merged 2 commits intomainfrom
worktree-opencode-orphan
Feb 25, 2026
Merged

fix: wait for entire process group before resolving exited promise#141
teng-lin merged 2 commits intomainfrom
worktree-opencode-orphan

Conversation

@teng-lin
Copy link
Owner

Summary

  • Root cause of the remaining orphan: PR fix: kill process group to prevent orphaned opencode serve processes #135 fixed signal delivery (process group kill sends SIGTERM to the Go binary too) but missed the timing gap. proc.exited was wired to child.on("exit") — the Node wrapper — which exits almost immediately after SIGTERM via Node's default handler. The Go binary received the signal but was still running its graceful HTTP shutdown for 1–5 s. killProcess() returned while the Go binary was alive and port-bound → observable orphan.
  • Fix: add waitForProcessGroupDead(pgid) — a 50 ms-interval poll of process.kill(-pgid, 0) that resolves only when the process group is fully empty (ESRCH). The exited promise now chains through this poll after child.on("exit"), giving callers a true "everything in the group is dead" signal.
  • SIGKILL path unchanged: if the Go binary outlasts killGracePeriodMs (5 s), the existing escalation sends SIGKILL to the process group; the poll immediately sees ESRCH and exited resolves.

Test plan

  • pnpm typecheck passes
  • All 2926 unit + integration tests pass (pnpm test)
  • Strengthened grandchild test: removed vi.waitFor safety net — grandchild must be dead synchronously after await handle.exited resolves (the old vi.waitFor was masking the race)
  • pnpm test:e2e:opencode — all 23 tests pass, zero orphaned processes remain after the suite exits (ps aux | grep opencode clean)

🤖 Generated with Claude Code

PR #135 fixed signal delivery (process group kill reaches the Go binary)
but missed the timing issue: proc.exited resolved as soon as the Node
wrapper exited — nearly instantly after SIGTERM because Node's default
handler terminates synchronously. The Go binary (the actual opencode
serve server) received SIGTERM too but spent another 1–5 seconds on its
graceful HTTP shutdown, so killProcess() returned while the Go binary
was still alive and port-bound → observable orphan.

Fix: add waitForProcessGroupDead() which polls process.kill(-pgid, 0)
every 50 ms until the process group has no more members (ESRCH). The
exited promise now chains through this poll after child.on("exit") fires,
so callers get a true "everything in the group is dead" signal. If the
Go binary outlasts killGracePeriodMs, the existing SIGKILL escalation
kills the group and the poll sees ESRCH immediately.

Update the grandchild unit test to assert the grandchild is dead
synchronously after awaiting exited (no vi.waitFor safety net), which
would have caught this bug.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @teng-lin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request provides a critical fix for a process management issue where child processes, particularly Go binaries spawned by a Node.js wrapper, could become orphaned. The solution introduces a robust polling mechanism that ensures the entire process group has terminated before signaling that a process has exited, thereby preventing resource leaks and improving the reliability of process termination.

Highlights

  • Orphan Process Fix: Addressed a timing gap where the proc.exited promise resolved prematurely, leading to orphaned Go binaries after a SIGTERM signal was sent to the Node.js wrapper process.
  • Process Group Polling: Implemented a new utility function, waitForProcessGroupDead, which polls for the complete termination of a process group before the exited promise is resolved, ensuring all child processes are properly cleaned up.
  • Synchronous Grandchild Termination: Ensured that the exited promise now resolves only when all processes in the group, including grandchildren, are truly dead. This allowed for the removal of a vi.waitFor safety net in tests, enforcing synchronous termination checks.
Changelog
  • src/adapters/node-process-manager.test.ts
    • Removed vi.waitFor from the grandchild termination test to enforce synchronous death assertion.
    • Updated comments to reflect the new synchronous expectation for grandchild process termination.
  • src/adapters/node-process-manager.ts
    • Added a new utility function waitForProcessGroupDead to poll for the termination of a process group.
    • Modified the exited promise resolution logic to chain through waitForProcessGroupDead after the direct child's exit event, ensuring the entire process group is dead.
    • Updated comments explaining the rationale behind waiting for the entire process group.
Activity
  • Initial PR content generated with Claude Code.
  • All typecheck, unit, integration, and e2e tests (pnpm test:e2e:opencode) passed, confirming the fix and preventing regressions.
  • Strengthened grandchild test by removing vi.waitFor to expose and fix the race condition, ensuring synchronous termination.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a crucial fix to prevent orphaned processes by ensuring that the exited promise resolves only after the entire process group has terminated. The waitForProcessGroupDead function, which polls for the process group's existence, addresses a timing gap where the Node wrapper would exit before the underlying Go binary had completed its graceful shutdown. This change significantly improves the robustness of process management. Additionally, a minor improvement for readability and maintainability is suggested by extracting a magic number into a named constant.

Comment on lines +24 to +26
} catch {
// ESRCH: group is gone; EPERM: can't signal (treat as gone)
resolve();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The catch block currently treats EPERM (permission denied) as if the process group is gone, similar to ESRCH. However, EPERM indicates that the process group exists but the current user lacks the necessary permissions to signal it. Resolving the waitForProcessGroupDead promise in this scenario could lead to the system incorrectly assuming the process group is dead, potentially leaving inaccessible but still-running processes orphaned. To ensure the process group is truly dead, the polling should continue on EPERM until the deadline is reached, allowing the killProcess() caller to escalate to SIGKILL if necessary.

      } catch (error: any) {
        if (error.code === 'ESRCH') {
          // ESRCH: group is gone
          resolve();
        } else if (error.code === 'EPERM') {
          // EPERM: Process group exists but current user lacks permission to signal.
          // This does not mean the group is dead. Continue polling until deadline.
          if (Date.now() >= deadline) {
            resolve(); // Resolve if deadline reached, allowing killProcess to escalate
          } else {
            setTimeout(poll, 50); // Continue polling
          }
        } else {
          // Other unexpected errors, log and resolve to avoid infinite loop
          console.error("Unexpected error in waitForProcessGroupDead:", error);
          resolve();
        }

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 03de39e: EPERM now continues polling (the group still exists, just not signalable) rather than resolving immediately. Only ESRCH resolves the promise. Deadline/SIGKILL escalation still applies if EPERM persists.

resolve();
return;
}
setTimeout(poll, 50);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The polling interval 50 is a magic number. Defining it as a named constant would improve readability and maintainability, making it easier to understand its purpose and adjust if needed.

        const PROCESS_GROUP_POLL_INTERVAL_MS = 50;
        setTimeout(poll, PROCESS_GROUP_POLL_INTERVAL_MS);

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 03de39e: extracted to PROCESS_GROUP_POLL_INTERVAL_MS = 50.

@teng-lin teng-lin merged commit a4582e1 into main Feb 25, 2026
6 checks passed
@teng-lin teng-lin deleted the worktree-opencode-orphan branch February 25, 2026 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant