-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Various users have reported that if you suspend and resume during an opensafely run invocation, then it doesn't come back properly.
Running locally, some times jobs hang and the run needs to be killed and restarted (the UX of which is exacerbated by poor cli ux for selecting which actions to run wrt forcing re-runs). This has been observed on Macs, and a quick search shows some users reporting issues like this (e.g. docker cp hanging, which we've observed before in Docker for Windows).
We should investigate the behavour of sleep/wake on running docker containers on Macos and Windows, and understand the failure cases.
We maybe be able to make changes to opensafely run that detect and handle these failures more robustly, allowing users to suspend without issue.
This might also be related to how opensafely runs in codespaces. Ideally, users should be able to leave opensafely run command executing, and the codespace would keep running, and not suspend. However users report the current behaviour is not that, which leads to some of them being able to use codespaces for this.
We may also want to look at improving detection of a codespace being restarted in the middle of a run, and handle that more gracefully.