Skip to content

Conversation

@rebkwok
Copy link
Contributor

@rebkwok rebkwok commented Feb 2, 2026

The weekly build starts on a Friday and typically takes about 5 days, with DB maintenance happening in the last 10-20 hours when the (quick) SwapTables event and the (not quick) CodedEvent_SNOMED rebuild event happen.

Occasionally something delays or slows down a build so that it takes longer than a week, and the next Fridays build starts before the previous one had finished. The previous logic for the maintenance mode check only looked at the most recently started overall build; this meant that if a new build started before the previous one had finished, it would have no associated SwapTables/CodedEvent_SNOMED events yet, and we're report that we were out of maintenance mode when we weren't.

We now look for the most recent TWO builds, so we can check that they're not both ongoing. If they are both ongoing, we use the earliest of the two to check for associated SwapTables/CodedEvent_SNOMED events. If we determine that we're not in maintenance mode, we now also do a final check to ensure that the CodedEvent_SNOMED table really is available.

We return both the maintenance mode status and the build count so that the RAP Agent can include the build count in telemetry, and we can set up alerts in honeycomb if we ever see a build count of 2.

Depends on the associated job-runner PR to handle the new output format

Closes #11
Closes #12

The weekly build starts on a Friday and typically takes about 5 days,
with DB maintenance happening in the last 10-20 hours when the (quick)
SwapTables event and the (not quick) CodedEvent_SNOMED rebuild event
happen.

Occasionally something delays or slows down a build so that it takes
longer than a week, and the next Fridays build starts before the
previous one had finished. The previous logic for the maintenance mode
check only looked at the most recently started overall build; this
meant that if a new build started before the previous one had finished,
it would have no associated SwapTables/CodedEvent_SNOMED events yet,
and we're report that we were out of maintenance mode when we weren't.

We now look for the most recent TWO builds, so we can check that
they're not both ongoing. If they are both ongoing, we use the earliest
of the two to check for associated SwapTables/CodedEvent_SNOMED events.
If we determine that we're not in maintenance mode, we now also do a final
check to ensure that the CodedEvent_SNOMED table really is available.

We return both the maintenance mode status and the build count so
that the RAP Agent can include the build count in telemetry, and we
can set up alerts in honeycomb if we ever see a build count of 2.
Copy link
Contributor

@evansd evansd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. The logic has a potential to be a bit convoluted here but it was all very clearly laid out and commented so I had no trouble following.

@rebkwok rebkwok marked this pull request as ready for review February 5, 2026 16:49
@rebkwok rebkwok merged commit 78fa023 into main Feb 5, 2026
3 checks passed
@rebkwok rebkwok deleted the rebkwok/db-maintenance-check branch February 5, 2026 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Notify/report/something if maintenance modes overlap Stay in maintenance mode if two maintenance modes overlap badly

2 participants