-
Notifications
You must be signed in to change notification settings - Fork 41
Description
labyrinth/labyrinth/repo_processor.py
Line 201 in 207dbce
| mods = _df["id"].apply(lambda x: x % divisor) |
This line uses the repo id and a modulus to decide how to split repos across parallel runs of the script. The problem is that sometimes individual runs can fail repeatedly, meaning that the same block of repos never gets worked on.
We can't just randomize it, because then we will have more than one process handling a repo.
So I'm thinking we need to add in some other factor that is constant for an individual run, but changes between runs.
Could be hour of the day, or maybe there's some run ID that can be converted to an int? The former can come from within the Python code directly, whereas the latter might require modification to the workflow scripts, unless there is some environment variable already there for the python code to use.