mongodb-使用索引计数每个值的发生数量?
对于每个值,我想计算存在多少个文档。
例如,我的数据可能看起来像:
[
{"a": 5}
{"a": 3}
{"a": 5}
{"a": 4}
]
我在字段“ A”上有一个索引。
我想要输出:
[
{5: 2},
{3: 1},
{4: 1}
]
我可以通过汇总来实现这一目标:
db.table.aggregate([
{
$group: {
_id: "$a",
count: {$sum: 1}
}
}
])
但是,在数百万个文档上,这真的很慢,因为$ group无法使用索引。
我尝试将其前缀以$排序的阶段来强制索引使用,因此$ group可以使用排序的文档来“更快”,但是查询仍然需要很长时间。
db.table.aggregate([
{
$sort: {
a: 1
}
},
{
$group: {
_id: "$a",
count: {$sum: 1}
}
}
])
从我如何理解索引,应该相对简单(如果我们可以直接访问索引)来模拟此查询。
是否有一种方法可以更有效地使用索引进行此特定查询(每个值计数)?
I want to count, for each value, how many documents exist.
For example, my data may look like:
[
{"a": 5}
{"a": 3}
{"a": 5}
{"a": 4}
]
and I have an index on the field "a".
I want the output:
[
{5: 2},
{3: 1},
{4: 1}
]
I can achieve this with an aggregation:
db.table.aggregate([
{
$group: {
_id: "$a",
count: {$sum: 1}
}
}
])
However, on millions of documents, this gets really really slow, as $group cannot use an index.
I have tried prefixing it with a $sort stage to force index use, so $group can use the sorted documents to be "faster", but the query is still taking a very very long time.
db.table.aggregate([
{
$sort: {
a: 1
}
},
{
$group: {
_id: "$a",
count: {$sum: 1}
}
}
])
From how I understand indexes, it should be relatively simple (if we had direct access to the index) to simulate this query.
Is there a way to do this specific query (document count per value) using an index, more efficiently?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如文档此处,
$ group
只能使用索引在每个组中找到第一个文档,提供:$ group
先于之前$ sort
将字段对组成的字段进行分组,并且$ prient
是$ group < /代码>。
因此,无法使用索引计算计数。如果这是您需要经常执行的操作,我建议将管道的输出存储在其他集合中,使用
$ MERGE
或$ ut
,例如这是:然后直接查询此集合,然后定期运行上述聚合管道,以更新此新集合,或者修改您的CRUD操作以进行此操作。
As mentioned in the documentation here,
$group
can only use an index to find the first document in each group, provided:$group
is preceded by$sort
that sorts the field to group by, and$first
is the only accumulator in$group
.So, to calculate counts using an index is not possible. If this is some operation, that you need to perform frequently, I will recommend to store the output of your pipeline in some other collection, using
$merge
or$out
, like this:And then query this collection directly, and either run the above aggregation pipeline periodically, to update this new collection or modify your CRUD operations to do so.