Skip to content

Will-Luck/Docker-Guardian

 
 

Repository files navigation

Docker-Guardian

CI GitHub Release Downloads GHCR

Dependency-aware container recovery for Docker. Rewritten in Go from willfarrell/docker-autoheal (MIT).

Static Go binary on Alpine. Multi-arch (amd64/arm64). No runtime dependencies.

Quick Start

docker run -d \
  --name docker-guardian \
  --restart=always \
  -e AUTOHEAL_CONTAINER_LABEL=all \
  -v /var/run/docker.sock:/var/run/docker.sock \
  ghcr.io/will-luck/docker-guardian

With notifications and metrics:

docker run -d \
  --name docker-guardian \
  --restart=always \
  -e AUTOHEAL_CONTAINER_LABEL=all \
  -e METRICS_PORT=9090 \
  -e NOTIFY_GOTIFY_URL=http://gotify:8080 \
  -e NOTIFY_GOTIFY_TOKEN=Axxxxxxxxx \
  -e NOTIFY_HOSTNAME=my-server \
  -e TZ=Europe/London \
  -p 9090:9090 \
  -v /var/run/docker.sock:/var/run/docker.sock \
  ghcr.io/will-luck/docker-guardian

What it does

Docker-Guardian restarts unhealthy containers, same as docker-autoheal. On top of that:

  • Circuit breaker — exponential backoff and restart budgets prevent restart storms
  • Dependency recovery — auto-restarts containers orphaned when their --network=container:X parent dies (exit code 128)
  • Event-driven — reacts to Docker events in real-time instead of polling
  • Orchestration awareness — pauses during Watchtower updates and backup jobs
  • Notifications — 9 native services (Gotify, Discord, Slack, Telegram, Pushover, Pushbullet, LunaSea, Email, Webhook) with rate limiting and retry
  • Prometheus metrics/metrics endpoint for observability
  • Per-container control — action labels (restart, stop, notify, none), notification filtering, custom stop timeouts

All original autoheal functionality is preserved.

The Problem

Containers sharing a network namespace (--network=container:X) all die with exit code 128 when the parent stops. Docker's restart policy brings back the parent, but dependents stay dead:

vpn-gateway (parent)
  ├── app-1  (--network=container:vpn-gateway)
  ├── app-2  (--network=container:vpn-gateway)
  └── app-3  (--network=container:vpn-gateway)

Docker-Guardian detects this and restarts them automatically.

Key Configuration

Variable Default Description
AUTOHEAL_CONTAINER_LABEL autoheal Label filter (all for all containers)
AUTOHEAL_INTERVAL 5 Poll interval in seconds
AUTOHEAL_GRACE_PERIOD 300 Skip recently-stopped containers (seconds)
AUTOHEAL_WATCHTOWER_COOLDOWN 300 Pause after orchestration activity (0 to disable)
AUTOHEAL_RESTART_BUDGET 5 Max restarts per window before circuit opens
AUTOHEAL_UNHEALTHY_THRESHOLD 1 Consecutive unhealthy checks before action
NOTIFY_HOSTNAME (empty) Prefix notifications with [hostname]
METRICS_PORT 0 Prometheus metrics port (0 = disabled)
TZ (empty) Timezone (e.g. Europe/London)

Full reference: docs/configuration.md

Documentation

Document Contents
Configuration All env vars, container labels
Features Circuit breaker, dependencies, orchestration awareness, metrics, decision flowchart
Notifications 9 services, event filtering, hostname prefix, healthcheck output
Development Building, testing, differences from upstream

Licence

MIT — see LICENSE. Original work by Will Farrell.

About

Dependency-aware container recovery for Docker. Go rewrite of docker-autoheal with circuit breakers, dependency recovery, 9 notification services, and Prometheus metrics.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Go 70.2%
  • Shell 27.6%
  • Dockerfile 1.7%
  • Makefile 0.5%