-
Notifications
You must be signed in to change notification settings - Fork 2
Description
I suggest adding a check to biteback to check the value of BOOTCOUNT in /.bootos.
If this value is nearing the reinstallation trigger, a warning should be sent (syslog? email? zmq?) to inform that
a) node may soon be reinstalled
b) attach the output from the biteback log to troubleshoot the problem.
Setting the warn-limit to for example >= 3 should give ample time to trouble-shoot why the boot counter wasn't reset, before the node is reinstalled (after which we will only know that it was reinstalled due to reaching the bootcount limit, but not why it ended up resorting to that). Also, having 3 instead of 1 will also give it some time to repair and not warn unnecessary.
There are, of course, reinstall cases where bitback is not run at all (reboots in quick succession comes to mind), but adding one extra check for this would provide more insight into cases where other parts are failing and triggering a reinstall.