fix: wait for entire process group before resolving exited promise#141
fix: wait for entire process group before resolving exited promise#141
Conversation
PR #135 fixed signal delivery (process group kill reaches the Go binary) but missed the timing issue: proc.exited resolved as soon as the Node wrapper exited — nearly instantly after SIGTERM because Node's default handler terminates synchronously. The Go binary (the actual opencode serve server) received SIGTERM too but spent another 1–5 seconds on its graceful HTTP shutdown, so killProcess() returned while the Go binary was still alive and port-bound → observable orphan. Fix: add waitForProcessGroupDead() which polls process.kill(-pgid, 0) every 50 ms until the process group has no more members (ESRCH). The exited promise now chains through this poll after child.on("exit") fires, so callers get a true "everything in the group is dead" signal. If the Go binary outlasts killGracePeriodMs, the existing SIGKILL escalation kills the group and the poll sees ESRCH immediately. Update the grandchild unit test to assert the grandchild is dead synchronously after awaiting exited (no vi.waitFor safety net), which would have caught this bug.
Summary of ChangesHello @teng-lin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request provides a critical fix for a process management issue where child processes, particularly Go binaries spawned by a Node.js wrapper, could become orphaned. The solution introduces a robust polling mechanism that ensures the entire process group has terminated before signaling that a process has exited, thereby preventing resource leaks and improving the reliability of process termination. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a crucial fix to prevent orphaned processes by ensuring that the exited promise resolves only after the entire process group has terminated. The waitForProcessGroupDead function, which polls for the process group's existence, addresses a timing gap where the Node wrapper would exit before the underlying Go binary had completed its graceful shutdown. This change significantly improves the robustness of process management. Additionally, a minor improvement for readability and maintainability is suggested by extracting a magic number into a named constant.
src/adapters/node-process-manager.ts
Outdated
| } catch { | ||
| // ESRCH: group is gone; EPERM: can't signal (treat as gone) | ||
| resolve(); |
There was a problem hiding this comment.
The catch block currently treats EPERM (permission denied) as if the process group is gone, similar to ESRCH. However, EPERM indicates that the process group exists but the current user lacks the necessary permissions to signal it. Resolving the waitForProcessGroupDead promise in this scenario could lead to the system incorrectly assuming the process group is dead, potentially leaving inaccessible but still-running processes orphaned. To ensure the process group is truly dead, the polling should continue on EPERM until the deadline is reached, allowing the killProcess() caller to escalate to SIGKILL if necessary.
} catch (error: any) {
if (error.code === 'ESRCH') {
// ESRCH: group is gone
resolve();
} else if (error.code === 'EPERM') {
// EPERM: Process group exists but current user lacks permission to signal.
// This does not mean the group is dead. Continue polling until deadline.
if (Date.now() >= deadline) {
resolve(); // Resolve if deadline reached, allowing killProcess to escalate
} else {
setTimeout(poll, 50); // Continue polling
}
} else {
// Other unexpected errors, log and resolve to avoid infinite loop
console.error("Unexpected error in waitForProcessGroupDead:", error);
resolve();
}There was a problem hiding this comment.
Fixed in 03de39e: EPERM now continues polling (the group still exists, just not signalable) rather than resolving immediately. Only ESRCH resolves the promise. Deadline/SIGKILL escalation still applies if EPERM persists.
src/adapters/node-process-manager.ts
Outdated
| resolve(); | ||
| return; | ||
| } | ||
| setTimeout(poll, 50); |
There was a problem hiding this comment.
There was a problem hiding this comment.
Fixed in 03de39e: extracted to PROCESS_GROUP_POLL_INTERVAL_MS = 50.
Summary
proc.exitedwas wired tochild.on("exit")— the Node wrapper — which exits almost immediately after SIGTERM via Node's default handler. The Go binary received the signal but was still running its graceful HTTP shutdown for 1–5 s.killProcess()returned while the Go binary was alive and port-bound → observable orphan.waitForProcessGroupDead(pgid)— a 50 ms-interval poll ofprocess.kill(-pgid, 0)that resolves only when the process group is fully empty (ESRCH). Theexitedpromise now chains through this poll afterchild.on("exit"), giving callers a true "everything in the group is dead" signal.killGracePeriodMs(5 s), the existing escalation sends SIGKILL to the process group; the poll immediately sees ESRCH andexitedresolves.Test plan
pnpm typecheckpassespnpm test)vi.waitForsafety net — grandchild must be dead synchronously afterawait handle.exitedresolves (the oldvi.waitForwas masking the race)pnpm test:e2e:opencode— all 23 tests pass, zero orphaned processes remain after the suite exits (ps aux | grep opencodeclean)🤖 Generated with Claude Code