Skip to content

Comments

fix: replace permanent watchdog halt with self-healing cooldown#74

Merged
dzianisv merged 1 commit intomainfrom
issue-73-tunnel-monitor
Feb 14, 2026
Merged

fix: replace permanent watchdog halt with self-healing cooldown#74
dzianisv merged 1 commit intomainfrom
issue-73-tunnel-monitor

Conversation

@dzianisv
Copy link
Owner

Summary

  • Root cause: The tunnel watchdog circuit breaker permanently halted (halted = true) after 5 restarts within 10 minutes, leaving the tunnel disconnected forever with no recovery path
  • Fix: Replace permanent halt with exponential backoff cooldown (5min → 10min → 20min → 30min cap) that automatically resumes monitoring after the cooldown expires
  • Process liveness: Add fast-path check — if this.process is null or killed, immediately trigger restart without waiting for 3 metric check failures
  • Recovery reset: When tunnel successfully recovers, reset cooldown count to 0
  • CLI display: opencode-manager status now shows watchdog cooldown state and remaining time when in cooldown

Files Changed

  • backend/src/services/tunnel-service.ts — Core watchdog logic: cooldown constants, extended TunnelStatus interface, self-healing startWatchdog() method
  • bin/cli.ts — Extended TunnelStatusResponse type and added cooldown display in commandHealth() output

Testing Done

  • Build succeeds: pnpm build
  • All 258 tests pass across 14 files: pnpm test
  • Tunnel service tests (17 tests) pass including status field validation

Closes #73

The tunnel watchdog circuit breaker permanently halted after 5 restarts,
leaving the tunnel disconnected forever. Replace with exponential backoff
cooldown (5min → 10min → 20min → 30min cap) that automatically resumes
monitoring. Add process liveness fast-path to detect dead cloudflared
immediately. Reset cooldown count on successful recovery.
@github-actions
Copy link

🔔 Push Browser E2E Test Recording

Screencast

Run #22023115548 | Commit f57dcd3

@github-actions
Copy link

🎥 Browser E2E Test Recording

Screencast

Run #22023115548 | Commit f57dcd3

@github-actions
Copy link

⚙️ Settings E2E Test Recording

Screencast

Run #22023115548 | Commit f57dcd3

@dzianisv dzianisv merged commit 531d3ed into main Feb 14, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Opencode-manager suppose to monitor cloudlfare tunnel and fix/restart it in case it fails

1 participant