-
Notifications
You must be signed in to change notification settings - Fork 166
👷 Fix deployment gate timing to ensure 30-minute duration #4096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The telemetry error checking loop was using a fixed iteration counter, which didn't account for the time spent in checkTelemetryErrors(). This function takes ~15-20 seconds per iteration due to API calls and rate limiting delays, causing the gate to run for ~37-40 minutes instead of the intended 30 minutes. Now using elapsed time tracking to ensure the gate completes in exactly 30 minutes regardless of check duration.
Replace buffered dot output with timestamped progress messages that display immediately in GitLab CI logs. Each check now prints: '✓ Check completed (N/30 minutes)' providing better visibility into the gate's progress without waiting for buffer flushes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5e38f1b1a6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| while (Date.now() < endTime) { | ||
| await checkTelemetryErrors(datacenters, browserSdkVersion) | ||
| process.stdout.write('.') // progress indicator | ||
| const elapsed = Math.round((Date.now() - startTime) / 1000 / 60) | ||
| printLog(`✓ Check completed (${elapsed}/${GATE_DURATION / ONE_MINUTE_IN_SECOND} minutes)`) | ||
| await timeout(GATE_INTERVAL * 1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skip final sleep once gate time has elapsed
Because the time check only happens at the top of the loop, a slow checkTelemetryErrors call can push the wall clock past endTime, but the code still waits GATE_INTERVAL before exiting. In the scenario you described (API calls taking 15–20s), the last iteration can still add an extra full interval, so the gate can exceed 30 minutes by up to GATE_INTERVAL. Consider re-checking the time (or computing the next sleep duration) before calling timeout so the last wait is skipped or shortened.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this can add up to a minute of extra time, it's still an improvement over the 10 extra minutes.
I don't think it's worth adding more complexity though
Motivation
The deployment gate was running longer than intended (~37-40 minutes instead of exactly 30 minutes) due to not accounting for the time spent in telemetry error checks. Each check takes ~15-20 seconds due to API calls and rate limiting delays, which accumulated over 30 iterations, adding 7-10 extra minutes to the gate duration.
Additionally, the progress indicator using
process.stdout.write('.')was being buffered in GitLab CI, causing dots to only appear in batches when other log statements flushed the buffer, making it difficult to monitor progress in real-time.Changes
gateTelemetryErrors()to use elapsed time tracking instead of counting iteration. At the end we have less checks, but we still check for 30 minutesprintLog('✓ Check completed (N/30 minutes)')instead of dots so it will not be buffered.Test instructions
Checklist