Script for running, scheduling and restoring local clearml server instances#288
Script for running, scheduling and restoring local clearml server instances#288OldaKodym wants to merge 3 commits intoclearml:masterfrom
Conversation
…earml server instances
|
Hi @OldaKo, This looks really cool! How did you test it? |
|
Hello @jkhenning It was a pretty manual process where I created a dummy task with random scalar chart, debug sample image and a textfile artifact. I also kept an infinite task running just to check it doesn't break things. I let that run, run the snapshot creation, kill the server and delete everything in /opt/clearml. Then I reinitialized the server, copied the backed-up config and docker-compose.yaml, run the docker network of the new instance and run the snapshot restoration. Scalars, debug samples and artifacts all appeared with no apparent issues. Something along these lines I repeated similar process for our live server, although I skipped the fileserver to save some time since that's just rsyncing files back and forth anyway. Our ES instance has ~80GB and I let a testing task run again to check that shards don't get corrupted by continuous logging while the snapshot is being created. Again, after deleting and restoration, everything seems to be in order. |
A self-installing script with cmd interface for ClearML local server instance live backups.
It allows you to create and restore ClearML snapshots. It supports backing up Elasticsearch, MongoDB, Redis, and fileserver data without shutting down the server. It also allows scheduling backups using cron jobs.
Simply run it with --help and go from there.

I tested it on a dummy server instance as well as on multi-TB instance while tasks were running and successfully restored the snapshots in both instances. Still, there may be some loose ends or edge cases left to handle. Would be glad for any feedback.