The aggregate command can return either a cursor or store the results in a collection. When returning a cursor or storing the results in a collection, each document in the result set is subject to the BSON Document Size limit, currently 16 megabytes; if any single document that exceeds the BSON Document Size limit, the command will produce an error. The limit only applies to the returned documents; during the pipeline processing, the documents may exceed this size. The db.collection.aggregate() method returns a cursor by default.
each document in the result set is subject to the BSON Document Size limit, currently 16 megabytes
我想知道這個 result set 是否就是 aggregate 返回的 result。如果是,那么 result set 中的單個元素的大小不能超過 16MB,否則整個 result set 的大小總和不能超過 16MB。
結論是 result 中的單個文件不能超過限制。
使用兩個 10 MB 的文件進行模擬:
from pymongo import MongoClient
from unittest import TestCase
class TestAggregateSizeLimit(TestCase):
def setUp(self):
self.client = MongoClient()
self.coll = self.client['test-database']['test-collection']
with open('10mb.txt', 'r') as f:
content = f.read()
self.coll.insert_one({
'filename': 1,
'content': content
})
self.coll.insert_one({
'filename': 2,
'content': content
})
def tearDown(self):
self.client.drop_database('test-database')
def test_two_aggregate_result(self):
result = list(self.coll.aggregate(
[
{'$sort': {'_id': 1}},
{'$group': {'_id': '$filename', 'content': {'$first': '$content'}}}
]
))
if result:
print('多個文件總和超過 16 MB,但是單個文件沒有超過 16MB,沒有問題')
else:
print('多個文件總和超過 16 MB,但是單個文件沒有超過 16MB,有問題')
def test_one_aggregate_result(self):
try:
list(self.coll.aggregate(
[
{'$group': {'_id': None, 'content': {'$push': '$content'}}}
]
))
except Exception as e:
# pymongo==2.8 報錯 “$cmd failed: aggregation result exceeds maximum document size (16MB)”
# pymongo==3.7.0 報錯 “BSONObj size: 20971635 (0x1400073) is invalid. Size must be between 0 and 16793600(16MB) ”
print(e)
print('結果中的單個文件超過 16MB,有問題')
else:
print('結果中的單個文件超過 16MB,沒有問題')
完整代碼見 https://github.com/Jay54520/playground/tree/master/mongodb_size_limit
另外,在搜索過程中發現有人說 allowDiskUse
可以解除這個限制,這個是錯誤的。allowDiskUse
用於避免 pipeline 的 stage 的內存使用超過 100 MB 而報錯,而上面的限制是針對單個文件而言。
Pipeline stages have a limit of 100 megabytes of RAM. If a stage exceeds this limit, MongoDB will produce an error. To allow for the handling of large datasets, use the allowDiskUse option to enable aggregation pipeline stages to write data to temporary files.[2]