PYTHON-5219 - Avoid awaiting coroutines when holding locks by NoahStapp · Pull Request #2250 · mongodb/mongo-python-driver

NoahStapp · 2025-04-01T14:12:27Z

No description provided.

ShaneHarvey · 2025-04-01T21:11:09Z

pymongo/asynchronous/topology.py

-                    await server.close()
+                    close_servers.append(server)
                    if not _IS_SYNC:
                        self._monitor_tasks.append(server._monitor)


Unrelated to this PR but why do we append to _monitor_tasks here?

To cleanup all the monitor tasks owned by the closed servers.

Since asyncio doesn't support forking at all, should we just remove this? It seems like non-functional code.

What if replace the entire fork branch here with a warning on async?

Before we make any code changes someone should test out the current behavior of fork+asyncio. Depending on the behavior we might need to reopen https://jira.mongodb.org/browse/PYTHON-5249.

This example:

import os from pymongo import AsyncMongoClient import asyncio async def test_func(): client = AsyncMongoClient() await client.aconnect() pid = os.fork() if pid == 0: await client.db.test.insert_one({'a': 1}) exit() print("Done!") asyncio.run(test_func())

Produces the following error multiple times, each with different tracebacks:

Traceback (most recent call last): File "/Users/nstapp/.pyenv/versions/3.13.0/lib/python3.13/asyncio/runners.py", line 194, in run return runner.run(main) ~~~~~~~~~~^^^^^^ File "/Users/nstapp/.pyenv/versions/3.13.0/lib/python3.13/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^ File "/Users/nstapp/.pyenv/versions/3.13.0/lib/python3.13/asyncio/base_events.py", line 708, in run_until_complete self.run_forever() ~~~~~~~~~~~~~~~~^^ File "/Users/nstapp/.pyenv/versions/3.13.0/lib/python3.13/asyncio/base_events.py", line 679, in run_forever self._run_once() ~~~~~~~~~~~~~~^^ File "/Users/nstapp/.pyenv/versions/3.13.0/lib/python3.13/asyncio/base_events.py", line 1989, in _run_once event_list = self._selector.select(timeout) File "/Users/nstapp/.pyenv/versions/3.13.0/lib/python3.13/selectors.py", line 548, in select kev_list = self._selector.control(None, max_ev, timeout) ValueError: I/O operation on closed kqueue object

The same error happens without using pymongo at all:

import asyncio import os async def test_func(): pid = os.fork() if pid == 0: await asyncio.sleep(.01) print("Done child!") exit() await asyncio.sleep(.1) print("Done parent!") asyncio.run(test_func())

ShaneHarvey · 2025-04-01T21:14:58Z

pymongo/asynchronous/topology.py

                # Close servers and clear the pools.
                for server in self._servers.values():
-                    await server.close()
+                    close_servers.append(server)


We close the servers here but we leave self._servers untouched? Is using a client post fork broken right now? I don't see where the Server gets recreated.

Our test_fork.py tests are all passing. We re-open each server in self._servers in _ensure_opened, called at the end of open here.

Here's the race I'm concerned about:

app forks with an open client

child process starts 2 threads that both call find_one()

both threads see a different PID and enter this if-block.

T1 acquires the lock first, resets the servers, reopens and then proceeds to server selection.

T2 then closes the server selected by T1 which causes a PoolClosedError in T1.

It could be that this is already possible with the current code. What do you think?

The addition of async hasn't changed the structure of this code, only the async/await syntax, so if this race condition does exist, it's existed for some time. Here's the identical PyMongo 4.8 version, before we added async support:

mongo-python-driver/pymongo/topology.py

Line 177 in de0f46a

def open(self) -> None:

I agree looking at the code that scenario certainly seems possible. We could add a flag set post-fork to prevent that race condition?

Not concerned about async here. I'm concerned that delaying the server.close() call to after we release the lock will make this race more likely.

I just realized: since we don't support fork + async at all, this change is also non-functional and can be reverted.

I mention that this code hasn't changed beside the addition of async to show that this race condition has either been present for quite some time, or doesn't exist.

Yeah let's undo the forking changes. Holding the lock while calling close() in the sync version isn't so bad in this case because it only happens once post fork().

PYTHON-5219 - Avoid awaiting coroutines when holding locks

0f6cd07

NoahStapp requested a review from ShaneHarvey April 1, 2025 14:12

Merge branch 'master' into PYTHON-5219

718cfee

ShaneHarvey reviewed Apr 1, 2025

View reviewed changes

NoahStapp requested a review from ShaneHarvey April 2, 2025 15:03

Revert topology changes

d5e7290

ShaneHarvey approved these changes Apr 2, 2025

View reviewed changes

NoahStapp merged commit b402239 into mongodb:master Apr 3, 2025
35 of 37 checks passed

Comments

Conversation

NoahStapp commented Apr 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NoahStapp Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NoahStapp Apr 2, 2025 •

edited

Loading