-
Notifications
You must be signed in to change notification settings - Fork 10
Description
First up, thanks for forking, maintaining, and improving this project! It's already doing most of what I want to see from something like this, but there are a few things I'd like to see implemented to make it tick all the boxes for me.
By default, restic will consider snapshots for removal by grouping them by host name and path:
When
forgetis run with a policy, restic first loads the list of all snapshots and groups them by their host name and paths. [...] The policy is then applied to each group of snapshots individually. This is a safety feature to prevent accidental removal of unrelated backup sets.
Currently, the hostname used by the backup container isn't necessarily stable or predictable, but that can hopefully be addressed in #104 (and actually looking closer now, I see that restic.forget here only groups by path and excludes host name)
Path is then the second part of the equation. Right now, only a single backup/snapshot is made with the path /volumes:
stack-back/src/restic_compose_backup/cli.py
Lines 237 to 242 in e11842b
| # back up volumes | |
| if has_volumes: | |
| try: | |
| logger.info("Backing up volumes") | |
| vol_result = restic.backup_files(config.repository, source="/volumes") | |
| logger.debug("Volume backup exit code: %s", vol_result) |
This encompasses all volumes selected for backup as a single snapshot. If this project is used as a sidecar to each compose project, this should be okay, but when a single instance is being used to back up multiple projects, the forget process may not be making the best choices.
I'm thinking of a scenario where I'm setting INCLUDE_ALL_COMPOSE_PROJECTS and INCLUDE_PROJECT_NAME and running a dedicated compose project for stack-back. In this scenario, if I spin up a new compose project, run it for a few days and then bring it down, its backup will only exist in the snapshot for those days, and the forget process won't know that, and might forget the daily/weekly backups that contain that project's backup.
What I'm proposing is that each volume (or each compose project if INCLUDE_PROJECT_NAME is set) will get a snapshot (backup) made, that way the forget process can know to not remove snapshots that contain unique data but that may not currently be getting backed up. De-duplication happens across an entire repository, and parent snapshots are matched on the same hostname/path pair, so this shouldn't lose any of the benefits of restic's compression/deduping. (I think it would also be a good idea to include a project's database dump in this same snapshot so as to capture the complete state correctly at a point-in-time and have the same forget rules applied, but I'm not sure if doing a single snapshot from stdin and a path at the same time is possible, so this would otherwise need a temp file and might not be as efficient)
I'll probably take a run at creating a PR for this in the next week or two.