Skip to content

Change how modulus is computed #6

@ahouseholder

Description

@ahouseholder

mods = _df["id"].apply(lambda x: x % divisor)

This line uses the repo id and a modulus to decide how to split repos across parallel runs of the script. The problem is that sometimes individual runs can fail repeatedly, meaning that the same block of repos never gets worked on.

We can't just randomize it, because then we will have more than one process handling a repo.

So I'm thinking we need to add in some other factor that is constant for an individual run, but changes between runs.
Could be hour of the day, or maybe there's some run ID that can be converted to an int? The former can come from within the Python code directly, whereas the latter might require modification to the workflow scripts, unless there is some environment variable already there for the python code to use.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions