mongodb-使用索引计数每个值的发生数量?

发布于 2025-02-05 11:32:11 字数 935 浏览 3 评论 0原文

对于每个值,我想计算存在多少个文档。

例如,我的数据可能看起来像:

[
    {"a": 5}
    {"a": 3}
    {"a": 5}
    {"a": 4}
]

我在字段“ A”上有一个索引。

我想要输出:

[
    {5: 2},
    {3: 1},
    {4: 1}
]

我可以通过汇总来实现这一目标:

db.table.aggregate([
    {
        $group: {
            _id: "$a",
            count: {$sum: 1}
        }
    }
])

但是,在数百万个文档上,这真的很慢,因为$ group无法使用索引。

我尝试将其前缀以$排序的阶段来强制索引使用,因此$ group可以使用排序的文档来“更快”,但是查询仍然需要很长时间。

db.table.aggregate([
    {
        $sort: {
            a: 1
        }
    },
    {
        $group: {
            _id: "$a",
            count: {$sum: 1}
        }
    }
])

从我如何理解索引,应该相对简单(如果我们可以直接访问索引)来模拟此查询。

是否有一种方法可以更有效地使用索引进行此特定查询(每个值计数)?

I want to count, for each value, how many documents exist.

For example, my data may look like:

[
    {"a": 5}
    {"a": 3}
    {"a": 5}
    {"a": 4}
]

and I have an index on the field "a".

I want the output:

[
    {5: 2},
    {3: 1},
    {4: 1}
]

I can achieve this with an aggregation:

db.table.aggregate([
    {
        $group: {
            _id: "$a",
            count: {$sum: 1}
        }
    }
])

However, on millions of documents, this gets really really slow, as $group cannot use an index.

I have tried prefixing it with a $sort stage to force index use, so $group can use the sorted documents to be "faster", but the query is still taking a very very long time.

db.table.aggregate([
    {
        $sort: {
            a: 1
        }
    },
    {
        $group: {
            _id: "$a",
            count: {$sum: 1}
        }
    }
])

From how I understand indexes, it should be relatively simple (if we had direct access to the index) to simulate this query.

Is there a way to do this specific query (document count per value) using an index, more efficiently?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

七度光 2025-02-12 11:32:11

如文档此处$ group只能使用索引在每个组中找到第一个文档,提供:

  1. $ group先于之前$ sort将字段对组成的字段进行分组,并且
  2. 在分组字段上有一个与排序顺序匹配的索引,而
  3. $ prient$ group < /代码>。

因此,无法使用索引计算计数。如果这是您需要经常执行的操作,我建议将管道的输出存储在其他集合中,使用$ MERGE$ ut,例如这是:

db.table.aggregate([
    {
        $group: {
            _id: "$a",
            count: {$sum: 1}
        }
    },
   { $merge: { into: "groupedData", on: "_id", whenMatched: "replace", whenNotMatched: "insert" } }
])

然后直接查询此集合,然后定期运行上述聚合管道,以更新此新集合,或者修改您的CRUD操作以进行此操作。

As mentioned in the documentation here, $group can only use an index to find the first document in each group, provided:

  1. $group is preceded by $sort that sorts the field to group by, and
  2. there is an index on the grouped field that matches the sort order, and
  3. $first is the only accumulator in $group.

So, to calculate counts using an index is not possible. If this is some operation, that you need to perform frequently, I will recommend to store the output of your pipeline in some other collection, using $merge or $out, like this:

db.table.aggregate([
    {
        $group: {
            _id: "$a",
            count: {$sum: 1}
        }
    },
   { $merge: { into: "groupedData", on: "_id", whenMatched: "replace", whenNotMatched: "insert" } }
])

And then query this collection directly, and either run the above aggregation pipeline periodically, to update this new collection or modify your CRUD operations to do so.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文