mongodb $ group+ $ sum聚集非常慢

发布于 2025-02-05 20:21:40 字数 3961 浏览 1 评论 0原文

我在MongoDB中有一个聚合查询:

[{
    $group: {
        _id: '$status',
        status: {
            $sum: 1
        }
    }
}]

它正在运行一个约8000万个文件的收藏品。状态字段是索引的,但是查询非常慢,并且运行约60秒或更长时间。

我在查询上做了一个dimend(),但几乎没有任何地方:

{
        "explainVersion" : "1",
        "stages" : [
                {
                        "$cursor" : {
                                "queryPlanner" : {
                                        "namespace" : "loa.document",
                                        "indexFilterSet" : false,
                                        "parsedQuery" : {

                                        },
                                        "queryHash" : "B9878693",
                                        "planCacheKey" : "8EAA28C6",
                                        "maxIndexedOrSolutionsReached" : false,
                                        "maxIndexedAndSolutionsReached" : false,
                                        "maxScansToExplodeReached" : false,
                                        "winningPlan" : {
                                                "stage" : "PROJECTION_SIMPLE",
                                                "transformBy" : {
                                                        "status" : 1,
                                                        "_id" : 0
                                                },
                                                "inputStage" : {
                                                        "stage" : "COLLSCAN",
                                                        "direction" : "forward"
                                                }
                                        },
                                        "rejectedPlans" : [ ]
                                }
                        }
                },
                {
                        "$group" : {
                                "_id" : "$status",
                                "status" : {
                                        "$sum" : {
                                                "$const" : 1
                                        }
                                }
                        }
                }
        ],
        "serverInfo" : {
                "host" : "rack-compute-2",
                "port" : 27017,
                "version" : "5.0.6",
                "gitVersion" : "212a8dbb47f07427dae194a9c75baec1d81d9259"
        },
        "serverParameters" : {
                "internalQueryFacetBufferSizeBytes" : 104857600,
                "internalQueryFacetMaxOutputDocSizeBytes" : 104857600,
                "internalLookupStageIntermediateDocumentMaxSizeBytes" : 104857600,
                "internalDocumentSourceGroupMaxMemoryBytes" : 104857600,
                "internalQueryMaxBlockingSortMemoryUsageBytes" : 104857600,
                "internalQueryProhibitBlockingMergeOnMongoS" : 0,
                "internalQueryMaxAddToSetBytes" : 104857600,
                "internalDocumentSourceSetWindowFieldsMaxMemoryBytes" : 104857600
        },
        "command" : {
                "aggregate" : "document",
                "pipeline" : [
                        {
                                "$group" : {
                                        "_id" : "$status",
                                        "status" : {
                                                "$sum" : 1
                                        }
                                }
                        }
                ],
                "explain" : true,
                "cursor" : {

                },
                "lsid" : {
                        "id" : UUID("a07e17fe-65ff-4d38-966f-7517b7a5d3f2")
                },
                "$db" : "loa"
        },
        "ok" : 1
}

我看到它做了一个完整的collscan,我只是不明白为什么。

我计划在该系列中支持几亿(甚至十亿)文件,但是这个问题似乎没有理由劫持我的计划。

I have an aggregation query in MongoDB:

[{
    $group: {
        _id: '$status',
        status: {
            $sum: 1
        }
    }
}]

It is running on a collection that has ~80 million documents. The status field is indexed, yet the query is very slow and runs for around 60 seconds or more.

I did an explain() on the query, but still got almost nowhere:

{
        "explainVersion" : "1",
        "stages" : [
                {
                        "$cursor" : {
                                "queryPlanner" : {
                                        "namespace" : "loa.document",
                                        "indexFilterSet" : false,
                                        "parsedQuery" : {

                                        },
                                        "queryHash" : "B9878693",
                                        "planCacheKey" : "8EAA28C6",
                                        "maxIndexedOrSolutionsReached" : false,
                                        "maxIndexedAndSolutionsReached" : false,
                                        "maxScansToExplodeReached" : false,
                                        "winningPlan" : {
                                                "stage" : "PROJECTION_SIMPLE",
                                                "transformBy" : {
                                                        "status" : 1,
                                                        "_id" : 0
                                                },
                                                "inputStage" : {
                                                        "stage" : "COLLSCAN",
                                                        "direction" : "forward"
                                                }
                                        },
                                        "rejectedPlans" : [ ]
                                }
                        }
                },
                {
                        "$group" : {
                                "_id" : "$status",
                                "status" : {
                                        "$sum" : {
                                                "$const" : 1
                                        }
                                }
                        }
                }
        ],
        "serverInfo" : {
                "host" : "rack-compute-2",
                "port" : 27017,
                "version" : "5.0.6",
                "gitVersion" : "212a8dbb47f07427dae194a9c75baec1d81d9259"
        },
        "serverParameters" : {
                "internalQueryFacetBufferSizeBytes" : 104857600,
                "internalQueryFacetMaxOutputDocSizeBytes" : 104857600,
                "internalLookupStageIntermediateDocumentMaxSizeBytes" : 104857600,
                "internalDocumentSourceGroupMaxMemoryBytes" : 104857600,
                "internalQueryMaxBlockingSortMemoryUsageBytes" : 104857600,
                "internalQueryProhibitBlockingMergeOnMongoS" : 0,
                "internalQueryMaxAddToSetBytes" : 104857600,
                "internalDocumentSourceSetWindowFieldsMaxMemoryBytes" : 104857600
        },
        "command" : {
                "aggregate" : "document",
                "pipeline" : [
                        {
                                "$group" : {
                                        "_id" : "$status",
                                        "status" : {
                                                "$sum" : 1
                                        }
                                }
                        }
                ],
                "explain" : true,
                "cursor" : {

                },
                "lsid" : {
                        "id" : UUID("a07e17fe-65ff-4d38-966f-7517b7a5d3f2")
                },
                "$db" : "loa"
        },
        "ok" : 1
}

I see that it does a full COLLSCAN, I just can't understand why.

I plan on supporting a couple hundred million (or even a billion) documents in that collection, but this problem hijacks my plans for seemingly no reason.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

孤檠 2025-02-12 20:21:41

您可以建议查询计划者使用索引,如下所示:

db.test.explain("executionStats").aggregate(
   [
     {$group:{ _id:"$status" ,status:{$sum:1} }}
   ],
     {hint:"status_1"}
   )

确保提示中的索引名称与创建的...
(db.test.getIndexes()将向您显示确切的索引名称)

You can advice the query planner to use the index as follow:

db.test.explain("executionStats").aggregate(
   [
     {$group:{ _id:"$status" ,status:{$sum:1} }}
   ],
     {hint:"status_1"}
   )

Make sure the index name in the hint is same as created ...
(db.test.getIndexes() will show you the exact index name )

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文