Skip to content

Conversation

@thomas-lebeau
Copy link
Collaborator

Motivation

The deployment gate was running longer than intended (~37-40 minutes instead of exactly 30 minutes) due to not accounting for the time spent in telemetry error checks. Each check takes ~15-20 seconds due to API calls and rate limiting delays, which accumulated over 30 iterations, adding 7-10 extra minutes to the gate duration.

Additionally, the progress indicator using process.stdout.write('.') was being buffered in GitLab CI, causing dots to only appear in batches when other log statements flushed the buffer, making it difficult to monitor progress in real-time.

Changes

  • Changed gateTelemetryErrors() to use elapsed time tracking instead of counting iteration. At the end we have less checks, but we still check for 30 minutes
  • Replaced progress message by printLog('✓ Check completed (N/30 minutes)') instead of dots so it will not be buffered.

Test instructions

Checklist

  • Tested locally
  • Tested on staging
  • Added unit tests for this change.
  • Added e2e/integration tests for this change.

The telemetry error checking loop was using a fixed iteration counter,
which didn't account for the time spent in checkTelemetryErrors().
This function takes ~15-20 seconds per iteration due to API calls
and rate limiting delays, causing the gate to run for ~37-40 minutes
instead of the intended 30 minutes.

Now using elapsed time tracking to ensure the gate completes in
exactly 30 minutes regardless of check duration.
Replace buffered dot output with timestamped progress messages that
display immediately in GitLab CI logs. Each check now prints:
'✓ Check completed (N/30 minutes)' providing better visibility into
the gate's progress without waiting for buffer flushes.
@thomas-lebeau thomas-lebeau marked this pull request as ready for review January 16, 2026 12:40
@thomas-lebeau thomas-lebeau requested a review from a team as a code owner January 16, 2026 12:40
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5e38f1b1a6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +69 to 73
while (Date.now() < endTime) {
await checkTelemetryErrors(datacenters, browserSdkVersion)
process.stdout.write('.') // progress indicator
const elapsed = Math.round((Date.now() - startTime) / 1000 / 60)
printLog(`✓ Check completed (${elapsed}/${GATE_DURATION / ONE_MINUTE_IN_SECOND} minutes)`)
await timeout(GATE_INTERVAL * 1000)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip final sleep once gate time has elapsed

Because the time check only happens at the top of the loop, a slow checkTelemetryErrors call can push the wall clock past endTime, but the code still waits GATE_INTERVAL before exiting. In the scenario you described (API calls taking 15–20s), the last iteration can still add an extra full interval, so the gate can exceed 30 minutes by up to GATE_INTERVAL. Consider re-checking the time (or computing the next sleep duration) before calling timeout so the last wait is skipped or shortened.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this can add up to a minute of extra time, it's still an improvement over the 10 extra minutes.

I don't think it's worth adding more complexity though

@thomas-lebeau thomas-lebeau merged commit 421d7e3 into main Jan 16, 2026
22 checks passed
@thomas-lebeau thomas-lebeau deleted the thomas.lebeau/fix-deployment-gate-timing branch January 16, 2026 14:18
@github-actions github-actions bot locked and limited conversation to collaborators Jan 16, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants