DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors#1862
Conversation
- add prose test - add assertions on the number of retries for maxAttempts tests - don't run clientBulkWrite tests on <8.0 servers
|
It looks like you also need to bump the schema version: |
|
WIP Python implementation: mongodb/mongo-python-driver#2635 |
|
All unified and prose tests are passing in the Python implementation. Edit: we're still failing one unified test, "client.clientBulkWrite retries using operation loop", investigating... Edit 2: we're all good now |
jyemin
left a comment
There was a problem hiding this comment.
I only reviewed the specification changes, not the pseudocode or tests. Those are best reviewed by implementers.
|
|
||
| ## Changelog | ||
|
|
||
| - 2025-XX-XX: Initial version. |
There was a problem hiding this comment.
Not sure how we handle the date... Is there an automation for this?
There was a problem hiding this comment.
Not that I know of. Usually the spec author fills it out before merging
I'll just leave this thread open to remind myself to add changelog dates before merging once all changes are completed.
jmikola
left a comment
There was a problem hiding this comment.
Request for changelog entries on the retryable specs. LGTM otherwise.
DRIVERS-3239
Overview
This PR adds support for a new class of errors (
SystemOverloadedError) to drivers' operation retry logic, as outlined in the design document.Additionally, it includes a new argument to the MongoDB handshake (also defined in the design document).
Python will be second implementer.
Node implementation: mongodb/node-mongodb-native#4806
Testing
The testing strategy is two-fold:
Building off of Ezra's work to generate unified tests for retryable handshake errors, this PR generates unified tests to confirm that:
SystemOverloadedErrorlabelFollowing Iris's work in DRIVERS-1934: withTransaction API retries too frequently #1851, this PR adds a prose test that ensures drivers apply exponential backoff in the retryability loop.
Update changelog.
Test changes in at least one language driver.
Test these changes against all server versions and topologies (including standalone, replica set, and sharded
clusters).