PYTHON-4947 - GridFS spec: Add performant 'delete revisions by filena…#2218
PYTHON-4947 - GridFS spec: Add performant 'delete revisions by filena…#2218NoahStapp merged 7 commits intomongodb:masterfrom
Conversation
…me' feature - delete_by_name
ShaneHarvey
left a comment
There was a problem hiding this comment.
Let's add the changelog entry here too.
gridfs/asynchronous/grid_file.py
Outdated
| """ | ||
| _disallow_transactions(session) | ||
| files = self._files.find({"filename": filename}, {"_id": 1}, session=session) | ||
| file_ids = [file_id["_id"] async for file_id in files] |
There was a problem hiding this comment.
[file["_id"] async for file in files]
| _disallow_transactions(session) | ||
| files = self._files.find({"filename": filename}, {"_id": 1}, session=session) | ||
| file_ids = [file_id["_id"] async for file_id in files] | ||
| res = await self._files.delete_many({"_id": {"$in": file_ids}}, session=session) |
There was a problem hiding this comment.
Does the spec say anything about what should happen if file_ids is so large/numerous it overflows maxBsonObjectSize? Assuming all ids are ObjectIds, this will happen when a single filename has around 850,000 revisions:
>>> len(encode({"_id": {"$in": [ObjectId()]*850000}}))
16888915
>>> MAX_BSON_SIZE
16777216There was a problem hiding this comment.
It makes no mention of that edge case. Do we have a standard pattern for overflow issues in other APIs?
| files = self._files.find({"filename": filename}, {"_id": 1}, session=session) | ||
| file_ids = [file_id["_id"] async for file_id in files] | ||
| res = await self._files.delete_many({"_id": {"$in": file_ids}}, session=session) | ||
| await self._chunks.delete_many({"files_id": {"$in": file_ids}}, session=session) |
There was a problem hiding this comment.
We could improve this by using client.bulk_write on 8.0+ servers to combine both deletes into one command but that's a spec issue.
| if not res.deleted_count: | ||
| raise NoFile("no file could be deleted because none matched %s" % file_id) | ||
|
|
||
| @_csot.apply |
There was a problem hiding this comment.
We need to add this helper to all the tests for sessions/transactions/CSOT. For example test_gridfs_bucket in test_session and test_gridfs_does_not_support_transactions.
There was a problem hiding this comment.
Should this be a separate ticket?
There was a problem hiding this comment.
Shouldn't the existing tests missing this decorator have it added in a separate ticket? Those changes seem unrelated to this addition.
There was a problem hiding this comment.
It's completely related. We're adding a new api so we have to test all the features. Although I'm not not what you mean by "tests missing this decorator".
There was a problem hiding this comment.
I'm saying we need to add the delete_by_name helper to the tests for sessions/transactions/CSOT...
| self.assertEqual(0, await self.db.fs.files.count_documents({})) | ||
| self.assertEqual(0, await self.db.fs.chunks.count_documents({})) | ||
| gfs = gridfs.AsyncGridFSBucket(self.db) | ||
| await gfs.upload_from_stream("test_filename", b"hello", chunk_size_bytes=1) |
There was a problem hiding this comment.
This test should upload multiple versions of test_filename and assert all are deleted.
There was a problem hiding this comment.
Oh I see the spec test already does this so you can disregard.
…me' feature - delete_by_name