mongodb $ group＆＃x2B; $ sum聚集非常慢

发布于 2025-02-05 20:21:40 字数 3961 浏览 1 评论 0原文

我在MongoDB中有一个聚合查询：

[{
    $group: {
        _id: '$status',
        status: {
            $sum: 1
        }
    }
}]

它正在运行一个约8000万个文件的收藏品。状态字段是索引的，但是查询非常慢，并且运行约60秒或更长时间。

我在查询上做了一个dimend（），但几乎没有任何地方：

{
        "explainVersion" : "1",
        "stages" : [
                {
                        "$cursor" : {
                                "queryPlanner" : {
                                        "namespace" : "loa.document",
                                        "indexFilterSet" : false,
                                        "parsedQuery" : {

                                        },
                                        "queryHash" : "B9878693",
                                        "planCacheKey" : "8EAA28C6",
                                        "maxIndexedOrSolutionsReached" : false,
                                        "maxIndexedAndSolutionsReached" : false,
                                        "maxScansToExplodeReached" : false,
                                        "winningPlan" : {
                                                "stage" : "PROJECTION_SIMPLE",
                                                "transformBy" : {
                                                        "status" : 1,
                                                        "_id" : 0
                                                },
                                                "inputStage" : {
                                                        "stage" : "COLLSCAN",
                                                        "direction" : "forward"
                                                }
                                        },
                                        "rejectedPlans" : [ ]
                                }
                        }
                },
                {
                        "$group" : {
                                "_id" : "$status",
                                "status" : {
                                        "$sum" : {
                                                "$const" : 1
                                        }
                                }
                        }
                }
        ],
        "serverInfo" : {
                "host" : "rack-compute-2",
                "port" : 27017,
                "version" : "5.0.6",
                "gitVersion" : "212a8dbb47f07427dae194a9c75baec1d81d9259"
        },
        "serverParameters" : {
                "internalQueryFacetBufferSizeBytes" : 104857600,
                "internalQueryFacetMaxOutputDocSizeBytes" : 104857600,
                "internalLookupStageIntermediateDocumentMaxSizeBytes" : 104857600,
                "internalDocumentSourceGroupMaxMemoryBytes" : 104857600,
                "internalQueryMaxBlockingSortMemoryUsageBytes" : 104857600,
                "internalQueryProhibitBlockingMergeOnMongoS" : 0,
                "internalQueryMaxAddToSetBytes" : 104857600,
                "internalDocumentSourceSetWindowFieldsMaxMemoryBytes" : 104857600
        },
        "command" : {
                "aggregate" : "document",
                "pipeline" : [
                        {
                                "$group" : {
                                        "_id" : "$status",
                                        "status" : {
                                                "$sum" : 1
                                        }
                                }
                        }
                ],
                "explain" : true,
                "cursor" : {

                },
                "lsid" : {
                        "id" : UUID("a07e17fe-65ff-4d38-966f-7517b7a5d3f2")
                },
                "$db" : "loa"
        },
        "ok" : 1
}

我看到它做了一个完整的collscan，我只是不明白为什么。

我计划在该系列中支持几亿（甚至十亿）文件，但是这个问题似乎没有理由劫持我的计划。

原文

I have an aggregation query in MongoDB:

[{
    $group: {
        _id: '$status',
        status: {
            $sum: 1
        }
    }
}]

It is running on a collection that has ~80 million documents. The status field is indexed, yet the query is very slow and runs for around 60 seconds or more.

I did an explain() on the query, but still got almost nowhere:

{
        "explainVersion" : "1",
        "stages" : [
                {
                        "$cursor" : {
                                "queryPlanner" : {
                                        "namespace" : "loa.document",
                                        "indexFilterSet" : false,
                                        "parsedQuery" : {

                                        },
                                        "queryHash" : "B9878693",
                                        "planCacheKey" : "8EAA28C6",
                                        "maxIndexedOrSolutionsReached" : false,
                                        "maxIndexedAndSolutionsReached" : false,
                                        "maxScansToExplodeReached" : false,
                                        "winningPlan" : {
                                                "stage" : "PROJECTION_SIMPLE",
                                                "transformBy" : {
                                                        "status" : 1,
                                                        "_id" : 0
                                                },
                                                "inputStage" : {
                                                        "stage" : "COLLSCAN",
                                                        "direction" : "forward"
                                                }
                                        },
                                        "rejectedPlans" : [ ]
                                }
                        }
                },
                {
                        "$group" : {
                                "_id" : "$status",
                                "status" : {
                                        "$sum" : {
                                                "$const" : 1
                                        }
                                }
                        }
                }
        ],
        "serverInfo" : {
                "host" : "rack-compute-2",
                "port" : 27017,
                "version" : "5.0.6",
                "gitVersion" : "212a8dbb47f07427dae194a9c75baec1d81d9259"
        },
        "serverParameters" : {
                "internalQueryFacetBufferSizeBytes" : 104857600,
                "internalQueryFacetMaxOutputDocSizeBytes" : 104857600,
                "internalLookupStageIntermediateDocumentMaxSizeBytes" : 104857600,
                "internalDocumentSourceGroupMaxMemoryBytes" : 104857600,
                "internalQueryMaxBlockingSortMemoryUsageBytes" : 104857600,
                "internalQueryProhibitBlockingMergeOnMongoS" : 0,
                "internalQueryMaxAddToSetBytes" : 104857600,
                "internalDocumentSourceSetWindowFieldsMaxMemoryBytes" : 104857600
        },
        "command" : {
                "aggregate" : "document",
                "pipeline" : [
                        {
                                "$group" : {
                                        "_id" : "$status",
                                        "status" : {
                                                "$sum" : 1
                                        }
                                }
                        }
                ],
                "explain" : true,
                "cursor" : {

                },
                "lsid" : {
                        "id" : UUID("a07e17fe-65ff-4d38-966f-7517b7a5d3f2")
                },
                "$db" : "loa"
        },
        "ok" : 1
}

I see that it does a full COLLSCAN, I just can't understand why.

I plan on supporting a couple hundred million (or even a billion) documents in that collection, but this problem hijacks my plans for seemingly no reason.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

孤檠 2025-02-12 20:21:41

您可以建议查询计划者使用索引，如下所示：

db.test.explain("executionStats").aggregate(
   [
     {$group:{ _id:"$status" ,status:{$sum:1} }}
   ],
     {hint:"status_1"}
   )

确保提示中的索引名称与创建的...
（db.test.getIndexes（）将向您显示确切的索引名称）

You can advice the query planner to use the index as follow:

db.test.explain("executionStats").aggregate(
   [
     {$group:{ _id:"$status" ,status:{$sum:1} }}
   ],
     {hint:"status_1"}
   )

Make sure the index name in the hint is same as created ...
(db.test.getIndexes() will show you the exact index name )

回复收藏 0 原文

~没有更多了~

关于作者

自此以后，行同陌路

暂无简介

文章

776 人气

关注发私信

alipaysp_snBf0MSZIv

文章 0 评论 0

关注

梦断已成空

文章 0 评论 0

关注

瞎闹

文章 0 评论 0

关注

凯凯我们等你回来

文章 0 评论 0

关注

寄意

文章 0 评论 0

关注

似梦非梦

文章 0 评论 0

友情链接

文江博客

mongodb $ group＆＃x2B; $ sum聚集非常慢

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

mongodb $ group＆＃x2B; $ sum聚集非常慢

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。