Skip to content

Investigate suspend and resume behaviour #377

@bloodearnest

Description

@bloodearnest

Various users have reported that if you suspend and resume during an opensafely run invocation, then it doesn't come back properly.

Running locally, some times jobs hang and the run needs to be killed and restarted (the UX of which is exacerbated by poor cli ux for selecting which actions to run wrt forcing re-runs). This has been observed on Macs, and a quick search shows some users reporting issues like this (e.g. docker cp hanging, which we've observed before in Docker for Windows).

We should investigate the behavour of sleep/wake on running docker containers on Macos and Windows, and understand the failure cases.

We maybe be able to make changes to opensafely run that detect and handle these failures more robustly, allowing users to suspend without issue.

This might also be related to how opensafely runs in codespaces. Ideally, users should be able to leave opensafely run command executing, and the codespace would keep running, and not suspend. However users report the current behaviour is not that, which leads to some of them being able to use codespaces for this.

We may also want to look at improving detection of a codespace being restarted in the middle of a run, and handle that more gracefully.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions