Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ createEntities:
useMultipleMongoses: false
observeEvents:
- commandStartedEvent
- commandFailedEvent
- database:
id: &database database
client: *client
Expand Down Expand Up @@ -104,9 +105,73 @@ tests:
command:
insert: *collectionName
maxTimeMS: { $$type: ["int", "long"] }
- commandFailedEvent:
commandName: insert
- commandStartedEvent:
commandName: abortTransaction
databaseName: admin
command:
abortTransaction: 1
maxTimeMS: { $$type: [ "int", "long" ] }

# This test verifies that when withTransaction encounters transient transaction errors it does not
# throw the transient transaction error when the timeout is exceeded, but instead surfaces a timeout error after
# exhausting the retry attempts within the specified timeout.
# The timeout error thrown contains as a cause the last transient error encountered.
- description: "withTransaction surfaces a timeout after exhausting transient transaction retries, retaining the last transient error as the timeout cause."
operations:
- name: failPoint
object: testRunner
arguments:
client: *failPointClient
failPoint:
configureFailPoint: failCommand
mode: alwaysOn
data:
failCommands: ["insert"]
blockConnection: true
blockTimeMS: 25
errorCode: 24
errorLabels: ["TransientTransactionError"]

- name: withTransaction
object: *session
arguments:
callback:
- name: insertOne
object: *collection
arguments:
document: { _id: 1 }
session: *session
expectError:
isError: true
expectError:
isTimeoutError: true

# Verify that multiple insert (at least 2) attempts occurred due to TransientTransactionError retries
# The exact number of events depends on timing and retry backoff, but there should be at least:
# - 2 commandStartedEvent for insert (initial + at least one retry)
# - 2 commandFailedEvent for insert (corresponding failures)
expectEvents:
- client: *client
ignoreExtraEvents: true
events:
# First insert attempt
- commandStartedEvent:
commandName: insert
- commandFailedEvent:
commandName: insert
- commandStartedEvent:
commandName: abortTransaction
- commandFailedEvent:
commandName: abortTransaction

# Second insert attempt (retry due to TransientTransactionError)
- commandStartedEvent:
commandName: insert
- commandFailedEvent:
commandName: insert
- commandStartedEvent:
commandName: abortTransaction
- commandFailedEvent:
commandName: abortTransaction
16 changes: 12 additions & 4 deletions source/transactions-convenient-api/tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,17 +30,23 @@ Drivers should test that `withTransaction` enforces a non-configurable timeout b
transactions. Specifically, three cases should be checked:

- If the callback raises an error with the TransientTransactionError label and the retry timeout has been exceeded,
`withTransaction` should propagate the error to its caller.
`withTransaction` should propagate the error (see Note 1 below) to its caller.
- If committing raises an error with the UnknownTransactionCommitResult label, and the retry timeout has been exceeded,
`withTransaction` should propagate the error to its caller.
`withTransaction` should propagate the error (see Note 1 below) to its caller.
- If committing raises an error with the TransientTransactionError label and the retry timeout has been exceeded,
`withTransaction` should propagate the error to its caller. This case may occur if the commit was internally retried
against a new primary after a failover and the second primary returned a NoSuchTransaction error response.
`withTransaction` should propagate the error (see Note 1 below) to its caller. This case may occur if the commit was
internally retried against a new primary after a failover and the second primary returned a NoSuchTransaction error
response.

If possible, drivers should implement these tests without requiring the test runner to block for the full duration of
the retry timeout. This might be done by internally modifying the timeout value used by `withTransaction` with some
private API or using a mock timer.

______________________________________________________________________

**Note 1:** The error SHOULD be propagated as a timeout error if the language allows to expose the underlying error as a
cause of a timeout error.

### Retry Backoff is Enforced

Drivers should test that retries within `withTransaction` do not occur immediately.
Expand Down Expand Up @@ -106,6 +112,8 @@ Drivers should test that retries within `withTransaction` do not occur immediate

## Changelog

- 2026-02-17: Clarify expected error when timeout is reached
[DRIVERS-3391](https://jira.mongodb.org/browse/DRIVERS-3391).
- 2026-01-07: Fixed Retry Backoff is Enforced test accordingly to the updated spec.
- 2025-11-18: Added Backoff test.
- 2024-09-06: Migrated from reStructuredText to Markdown.
Expand Down
49 changes: 36 additions & 13 deletions source/transactions-convenient-api/transactions-convenient-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,8 +123,8 @@ This method should perform the following sequence of actions:

2. If `transactionAttempt` > 0:

1. If elapsed time + `backoffMS` > `TIMEOUT_MS`, then raise the previously encountered error. If the elapsed time of
`withTransaction` is less than TIMEOUT_MS, calculate the backoffMS to be
1. If elapsed time + `backoffMS` > `TIMEOUT_MS`, then raise the previously encountered error (see Note 1 below). If
the elapsed time of `withTransaction` is less than TIMEOUT_MS, calculate the backoffMS to be
`jitter * min(BACKOFF_INITIAL * 1.5 ** (transactionAttempt - 1), BACKOFF_MAX)`. sleep for `backoffMS`.

1. jitter is a random float between \[0, 1)
Expand Down Expand Up @@ -162,7 +162,8 @@ This method should perform the following sequence of actions:
committed a transaction, propagate the callback's error to the caller of `withTransaction` and return
immediately.

4. Otherwise, propagate the callback's error to the caller of `withTransaction` and return immediately.
4. Otherwise, propagate the callback's error (see Note 1 below) to the caller of `withTransaction` and return
immediately.

8. If the ClientSession is in the "no transaction", "transaction aborted", or "transaction committed" state, assume the
callback intentionally aborted or committed the transaction and return immediately.
Expand All @@ -178,10 +179,21 @@ This method should perform the following sequence of actions:

2. If the `commitTransaction` error includes a "TransientTransactionError" label, jump back to step two.

3. Otherwise, propagate the `commitTransaction` error to the caller of `withTransaction` and return immediately.
3. Otherwise, propagate the `commitTransaction` error (see Note 1 below) to the caller of `withTransaction` and
return immediately.

11. The transaction was committed successfully. Return immediately.

______________________________________________________________________

**Note 1:** When the `TIMEOUT_MS` (calculated in step [1.3](#sequence-of-actions)) is reached we MUST report a timeout
error wrapping the last error that was encountered which triggered the retry behavior. If `timeoutMS` is set, then
timeout error is a special type which is defined in CSOT
[specification](https://github.com/mongodb/specifications/blob/master/source/client-side-operations-timeout/client-side-operations-timeout.md#errors)
, If `timeoutMS` is not set, then propagate it as timeout error if the language allows to expose the underlying error as
a cause of a timeout error (see `makeTimeoutError` below in [pseudo-code](#pseudo-code)). If timeout error is thrown
then it SHOULD expose error label(s) from the transient error.

##### Pseudo-code

This method can be expressed by the following pseudo-code:
Expand All @@ -203,7 +215,7 @@ withTransaction(callback, options) {
BACKOFF_MAX);

if (Date.now() + backoff - startTime >= timeout) {
throw lastError;
throw makeTimeoutError(lastError);
}
sleep(backoff);
}
Expand All @@ -220,9 +232,12 @@ withTransaction(callback, options) {
this.abortTransaction();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is out of diff: but we also need to raise to clarify the error handling behavior on line 206.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also out of diff, but can you also update the prose description accordingly?

}

if (error.hasErrorLabel("TransientTransactionError") &&
Date.now() - startTime < timeout) {
continue retryTransaction;
if (error.hasErrorLabel("TransientTransactionError")) {
if (Date.now() - startTime < timeout) {
continue retryTransaction;
} else {
throw makeTimeoutError(error)
Copy link
Member

@vbabanin vbabanin Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As i understand based on the PR and ticket descriptions the change in this PR is intended to clarify CSOT behaviour in case of a timeout.

The non-timeoutMS (legacy 120-second timeout) path in pseudocode currently contradicts the prose spec requirements.

1) CSOT spec forbids changing the legacy path when timeoutMS is not set
client-side-operations-timeout.md (lines 401–402):

“If timeoutMS is not set, drivers MUST continue to exhibit the existing 120 second timeout behavior. Drivers MUST NOT change existing implementations to use timeoutMS=120000 for this case.”

client-side-operations-timeout.md (lines 54–59):

“If drivers need to make backwards-breaking changes to support timeoutMS, the backwards breaking behavior MUST be gated behind the presence of the timeoutMS option... Backwards breaking changes include any changes to exception types thrown by stable API methods or changes to timeout behavior.”

withTransaction is a stable API method. If the legacy path (no timeoutMS) now throws a timeout wrapper instead of the previously encountered error, that changes the exception type for a stable API without being gated behind timeoutMS.

2) Convenient transactions prose specifies propagating the previously encountered error without wrapping it in a timeout exception
Lines 97–102:

“If the retry timeout has been exceeded, drivers MUST NOT retry the transaction and allow withTransaction to propagate the error to its caller.”

Step 2.1:

“If elapsed time + backoffMS > TIMEOUT_MS, then raise the previously encountered error.”

This describes throwing the same error that triggered the retry loop (the previously encountered TransientTransactionError), not a new timeout wrapper. The prose should be updated alongside the pseudocode change, so the two are consistent.

Given the above, i think we should:

  • keep the legacy behavior when timeoutMS is not set (i.e., propagate the previously encountered error).
  • update the prose spec to reflect the intended CSOT (timeoutMS-set) behavior if the pseudocode now wraps the error in a timeout exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points 👍

  • The spec doesn't define exception types/hierarchy, only "errors". In Java, MongoTimeoutException extends MongoClientExceptionMongoException, so users catching on these parent types won't break. This is driver-specific but doesn't violate the spec AFAIK.

  • Non-transient error behavior unchanged: Only transient errors get wrapped when timeout occurs. Non-transient errors like DuplicateKeyException continue to be thrown directly to users as before.

  • Convenient API design contradiction: The withTransaction convenient API is designed to automatically retry transient errors. Users catching specific transient exceptions defeats the purpose - they should use Core API if they want that level of control.

The key question:
When withTransaction exhausts its timeout budget while retrying transient errors, should we throw the last transient error directly, or wrap it in a timeout exception?

Timeout is the only exit for transient errors: In the retry loop, transient errors keep getting retried until timeout is reached. The timeout becomes the actual failure reason from the user's perspective. (The transient error is a contributing factor but not the ultimate cause of failure.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping the last transient error in a timeout exception is much clearer. Otherwise the user will get an error that is not actually the cause of their withTransaction stopping execution, which is confusing.

The legacy workflow potentially broken by this clarification would be a user explicitly catching all of the transient exceptions for custom handling. I agree with Nabil that such a use case seems at odds with the withTransaction convenient API.

Copy link
Member

@vbabanin vbabanin Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: I’m not arguing against wrapping transient errors in a timeout exception in general. Conceptually, a timeout error is a cleaner terminal outcome for both CSOT and the legacy (pre-CSOT) path.

The concern is scope/spec alignment. The ticket/PR is framed as "Clarify withTransaction CSOT timeout", but the pseudocode change also alters the legacy (no timeoutMS) behavior. Is the legacy behavior change intended as part of this CSOT ticket, or is it a separate effort? If it’s intentional, it would be good to make that explicit in the PR title/description (and ticket, if possible) so scope is unambiguous.

If this is part of CSOT, the CSOT spec currently requires preserving legacy behavior when timeoutMS is not set:
client-side-operations-timeout.md#401–402:

"If timeoutMS is not set, drivers MUST continue to exhibit the existing 120 second timeout behavior."

In any case, the prose needs to be updated alongside the pseudocode, and the legacy prose tests need clarification if legacy behavior is changing. Currently they say:
transactions-convenient-api/tests/README.md#L34-L35:

"If the callback raises an error with the TransientTransactionError label and the retry timeout has been exceeded, withTransaction should propagate the error to its caller."

As written, that appears to describe "propagate the previously encountered error" (which was the intent previously), not "throw a timeout".

One more clarification: CSOT tests explicitly verify the timeout exception carries the last transient error as the cause:

specified timeout. The timeout error thrown contains as a cause the last transient error encountered.

Should the legacy timeout exception be required to carry the last transient error as the cause as well? The pseudocode calls createLegacyMongoTimeoutException(...), but the semantics (cause/labels/etc.) aren’t specified. This should be spelled out in the prose and tests so drivers don’t diverge.

}
}

throw error;
Expand All @@ -247,15 +262,16 @@ withTransaction(callback, options) {
* {ok:0, code: 50, codeName: "MaxTimeMSExpired"}
* {ok:1, writeConcernError: {code: 50, codeName: "MaxTimeMSExpired"}}
*/
lastError = error;
if (Date.now() - startTime >= timeout) {
throw makeTimeoutError(error);
}
if (!isMaxTimeMSExpiredError(error) &&
error.hasErrorLabel("UnknownTransactionCommitResult") &&
Date.now() - startTime < timeout) {
error.hasErrorLabel("UnknownTransactionCommitResult")) {
continue retryCommit;
}

if (error.hasErrorLabel("TransientTransactionError") &&
Date.now() - startTime < timeout) {
lastError = error;
if (error.hasErrorLabel("TransientTransactionError")) {
continue retryTransaction;
}

Expand All @@ -266,6 +282,10 @@ withTransaction(callback, options) {
break; // Transaction was successful
}
}

function makeTimeoutError(error) {
return getCSOTTimeoutIfSet() != null ? createCSOTMongoTimeoutException(error) : createLegacyMongoTimeoutException(error);
}
```

### ClientSession must provide access to a MongoClient
Expand Down Expand Up @@ -419,6 +439,9 @@ provides an implementation of a technique already described in the MongoDB 4.0 d

## Changelog

- 2026-02-17: Clarify expected error when timeout is reached
[DRIVERS-3391](https://jira.mongodb.org/browse/DRIVERS-3391).

- 2025-11-20: withTransaction applies exponential backoff when retrying.

- 2024-09-06: Migrated from reStructuredText to Markdown.
Expand Down
Loading