mongodb-使用索引计数每个值的发生数量？

发布于 2025-02-05 11:32:11 字数 935 浏览 3 评论 0原文

对于每个值，我想计算存在多少个文档。

例如，我的数据可能看起来像：

[
    {"a": 5}
    {"a": 3}
    {"a": 5}
    {"a": 4}
]

我在字段“ A”上有一个索引。

我想要输出：

[
    {5: 2},
    {3: 1},
    {4: 1}
]

我可以通过汇总来实现这一目标：

db.table.aggregate([
    {
        $group: {
            _id: "$a",
            count: {$sum: 1}
        }
    }
])

但是，在数百万个文档上，这真的很慢，因为$ group无法使用索引。

我尝试将其前缀以$排序的阶段来强制索引使用，因此$ group可以使用排序的文档来“更快”，但是查询仍然需要很长时间。

db.table.aggregate([
    {
        $sort: {
            a: 1
        }
    },
    {
        $group: {
            _id: "$a",
            count: {$sum: 1}
        }
    }
])

从我如何理解索引，应该相对简单（如果我们可以直接访问索引）来模拟此查询。

是否有一种方法可以更有效地使用索引进行此特定查询（每个值计数）？

原文

I want to count, for each value, how many documents exist.

For example, my data may look like:

[
    {"a": 5}
    {"a": 3}
    {"a": 5}
    {"a": 4}
]

and I have an index on the field "a".

I want the output:

[
    {5: 2},
    {3: 1},
    {4: 1}
]

I can achieve this with an aggregation:

db.table.aggregate([
    {
        $group: {
            _id: "$a",
            count: {$sum: 1}
        }
    }
])

However, on millions of documents, this gets really really slow, as $group cannot use an index.

I have tried prefixing it with a $sort stage to force index use, so $group can use the sorted documents to be "faster", but the query is still taking a very very long time.

db.table.aggregate([
    {
        $sort: {
            a: 1
        }
    },
    {
        $group: {
            _id: "$a",
            count: {$sum: 1}
        }
    }
])

From how I understand indexes, it should be relatively simple (if we had direct access to the index) to simulate this query.

Is there a way to do this specific query (document count per value) using an index, more efficiently?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

七度光 2025-02-12 11:32:11

如文档此处，$ group只能使用索引在每个组中找到第一个文档，提供：

$ group先于之前$ sort将字段对组成的字段进行分组，并且
在分组字段上有一个与排序顺序匹配的索引，而
$ prient是$ group < /代码>。

因此，无法使用索引计算计数。如果这是您需要经常执行的操作，我建议将管道的输出存储在其他集合中，使用$ MERGE或$ ut，例如这是：

db.table.aggregate([
    {
        $group: {
            _id: "$a",
            count: {$sum: 1}
        }
    },
   { $merge: { into: "groupedData", on: "_id", whenMatched: "replace", whenNotMatched: "insert" } }
])

然后直接查询此集合，然后定期运行上述聚合管道，以更新此新集合，或者修改您的CRUD操作以进行此操作。

As mentioned in the documentation here, $group can only use an index to find the first document in each group, provided:

$group is preceded by $sort that sorts the field to group by, and
there is an index on the grouped field that matches the sort order, and
$first is the only accumulator in $group.

So, to calculate counts using an index is not possible. If this is some operation, that you need to perform frequently, I will recommend to store the output of your pipeline in some other collection, using $merge or $out, like this:

db.table.aggregate([
    {
        $group: {
            _id: "$a",
            count: {$sum: 1}
        }
    },
   { $merge: { into: "groupedData", on: "_id", whenMatched: "replace", whenNotMatched: "insert" } }
])

And then query this collection directly, and either run the above aggregation pipeline periodically, to update this new collection or modify your CRUD operations to do so.

回复收藏 0 原文

~没有更多了~