Forked from pingcap/dead-mans-switch
Dead Man's Switch is a simple Prometheus alert manager webhook service. It provides a basic mechanism to ensure alerting pipeline is healthy.
It will send notifications to PagerDuty when it detects that the alerting pipeline is unhealthy.
Prometheus provides a mechanism to make defione alerts that will be always firing with the expr: vector(1) rule. For example:
- alert: Watchdog
annotations:
message: |
This is an alert meant to ensure that the entire alerting pipeline is functional.
This alert is always firing, therefore it should always be firing in Alertmanager
and always fire against a receiver. There are integrations with various notification
mechanisms that send a notification when this alert is not firing. For example the
"DeadMansSnitch" integration in PagerDuty.
expr: vector(1)
labels:
severity: noneWe call this the Watchdog alert. With it, we can monitor the alerting pipeline: if the alert is not firing, the alert system is unhealthy and alerts are not being sent.
Dead Man's Switch exposes an webhook endpoint at /webhook, that should receive the Watchdog alert from Alertmanager.
It will monitor the alerts received, evaluating alerts based on the rules defined in the configuration file.
You need to define your own Watchdog alert in your monitoring infrastructure.
The configuration for the application is done via command-line flags and a simple YAML configuration file.
-port: Which port the server will listen for webhooks, healthcheck and metrics.-config: Path to configuration file.
PAGERDUTY_API_KEY: Overwritenotify.pagerDuty.key, so it may be ommited from the configuration file.
The configuration file can be used to define the evaluation interval, the rules used to evaluate if the alerts received are the ones expected and it if should trigger an notification.
There are also options to configure the notification sent to PagerDuty.
You can find an example of configuration in config.example.yaml.
Make sure the Watchdog alert is sent to Dead Man's Switch as an Alertmanager Receiver:
route:
routes:
- receiver: dead-mans-switch
match:
alertname: 'Watchdog'Make sure the Alertmanager Receiver is defined:
receivers:
- name: dead-mans-switch
webhook_configs:
- url: http://dead-mans-switch:8080/webhookBuild binary in local environment:
make buildRun in local environment:
make runSend alert manager webhook payload:
curl -H "Content-Type: application/json" --data @payload.json http://localhost:8080/webhookThe release process is triggered by tags. To trigger a new image build and release, use the following:
git tag <version>
git push origin --tagsThe Docker image should be available at ghcr.io/XLabs/dead-mans-switch with the tags <version> and latest.
Pull requests trigger the pipeline but will not publish the built container image.