Skip to content

Check boot counter #6

@alfs

Description

@alfs

I suggest adding a check to biteback to check the value of BOOTCOUNT in /.bootos.

If this value is nearing the reinstallation trigger, a warning should be sent (syslog? email? zmq?) to inform that
a) node may soon be reinstalled
b) attach the output from the biteback log to troubleshoot the problem.

Setting the warn-limit to for example >= 3 should give ample time to trouble-shoot why the boot counter wasn't reset, before the node is reinstalled (after which we will only know that it was reinstalled due to reaching the bootcount limit, but not why it ended up resorting to that). Also, having 3 instead of 1 will also give it some time to repair and not warn unnecessary.

There are, of course, reinstall cases where bitback is not run at all (reboots in quick succession comes to mind), but adding one extra check for this would provide more insight into cases where other parts are failing and triggering a reinstall.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions