Skip to content

Conversation

@joshua-hampton
Copy link
Collaborator

Checking NCAS files requires a call to GitHub for the latest instrument data. When checking a lot of files at once, this created a lot of calls to GitHub at once, a lot of which GitHub would reject, resulting in checksit errors. This PR address this by reducing the number of requests to GitHub needed when checking lots of files:

  • within the SpecificationChecker class, the module that contains the function needed by the spec was imported every time, even if it has previously been imported. This meant that modules that themselves import the cvs module with the vocab checks (e.g. checksit.generic) were refreshing their vocab cache for each spec check. This has been changed to only import modules if not previously imported, allowing the vocab cache to persist across multiple spec checks.
  • a new CLI command check-files has been added. This is functionally identical to the check CLI command, except that it takes a number of files as an argument and runs checksit on each. Doing this, rather than running check on each file sequentially, allows the vocab cache to persist from one file to the next.
  • in case this still results in requests being denied, an additional function has been added to the loading of URLs, which sleeps for a few seconds and then tries again if either a 429 status code or a TimeoutError is returned

@joshua-hampton joshua-hampton merged commit 6a838c5 into main Apr 29, 2025
5 checks passed
@joshua-hampton joshua-hampton deleted the reduce-import branch April 29, 2025 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants