MongoDB 聚合結果大小限制


The aggregate command can return either a cursor or store the results in a collection. When returning a cursor or storing the results in a collection, each document in the result set is subject to the BSON Document Size limit, currently 16 megabytes; if any single document that exceeds the BSON Document Size limit, the command will produce an error. The limit only applies to the returned documents; during the pipeline processing, the documents may exceed this size. The db.collection.aggregate() method returns a cursor by default.

each document in the result set is subject to the BSON Document Size limit, currently 16 megabytes

我想知道這個 result set 是否就是 aggregate 返回的 result。如果是,那么 result set 中的單個元素的大小不能超過 16MB,否則整個 result set 的大小總和不能超過 16MB。

結論是 result 中的單個文件不能超過限制。

使用兩個 10 MB 的文件進行模擬:

from pymongo import MongoClient
from unittest import TestCase


class TestAggregateSizeLimit(TestCase):

    def setUp(self):
        self.client = MongoClient()
        self.coll = self.client['test-database']['test-collection']

        with open('10mb.txt', 'r') as f:
            content = f.read()

        self.coll.insert_one({
            'filename': 1,
            'content': content
        })
        self.coll.insert_one({
            'filename': 2,
            'content': content
        })

    def tearDown(self):
        self.client.drop_database('test-database')

    def test_two_aggregate_result(self):
        result = list(self.coll.aggregate(
            [
                {'$sort': {'_id': 1}},
                {'$group': {'_id': '$filename', 'content': {'$first': '$content'}}}
            ]
        ))

        if result:
            print('多個文件總和超過 16 MB,但是單個文件沒有超過 16MB,沒有問題')
        else:
            print('多個文件總和超過 16 MB,但是單個文件沒有超過 16MB,有問題')

    def test_one_aggregate_result(self):
        try:
            list(self.coll.aggregate(
                [
                    {'$group': {'_id': None, 'content': {'$push': '$content'}}}
                ]
            ))
        except Exception as e:
            # pymongo==2.8 報錯 “$cmd failed: aggregation result exceeds maximum document size (16MB)”
            # pymongo==3.7.0 報錯 “BSONObj size: 20971635 (0x1400073) is invalid. Size must be between 0 and 16793600(16MB) ”
            print(e)
            print('結果中的單個文件超過 16MB,有問題')
        else:
            print('結果中的單個文件超過 16MB,沒有問題')

完整代碼見 https://github.com/Jay54520/playground/tree/master/mongodb_size_limit

另外,在搜索過程中發現有人說 allowDiskUse 可以解除這個限制,這個是錯誤的。allowDiskUse 用於避免 pipeline 的 stage 的內存使用超過 100 MB 而報錯,而上面的限制是針對單個文件而言。

Pipeline stages have a limit of 100 megabytes of RAM. If a stage exceeds this limit, MongoDB will produce an error. To allow for the handling of large datasets, use the allowDiskUse option to enable aggregation pipeline stages to write data to temporary files.[2]

參考

  1. https://docs.mongodb.com/manual/core/aggregation-pipeline-limits/#result-size-restrictions
  2. https://docs.mongodb.com/manual/core/aggregation-pipeline-limits/#memory-restrictions


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM