当前位置：文江博客话题详情

MongoDB“count()”非常慢。我们如何改进/解决它？

发布于 2024-12-08 15:24:34 字数 182 浏览 1 评论 0原文

我目前正在使用 MongoDB 来处理数百万条数据记录。我发现了一件非常烦人的事情。

当我使用“count()”函数进行少量查询数据收集时，速度非常快。然而，当查询的数据集合包含数千甚至数百万条数据记录时，整个系统变得非常缓慢。

我确保已对所需字段建立索引。

有人遇到过同样的事情吗？你如何改善这一点？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

别闹i 2024-12-15 15:24:34

现在除了创建适当的索引之外还有另一种优化。

db.users.ensureIndex({name:1});
db.users.find({name:"Andrei"}).count();

如果您需要一些计数器，我建议尽可能预先计算它们。通过使用原子 $inc 操作而不是使用 count( {}) 完全没有。

但是 mongodb 人员在 mongodb 上努力工作，因此，他们计划根据 jira 错误。

There is now another optimization than create proper index.

db.users.ensureIndex({name:1});
db.users.find({name:"Andrei"}).count();

If you need some counters i suggest to precalculate them whenever it possible. By using atomic $inc operation and not use count({}) at all.

But mongodb guys working hard on mongodb, so, count({}) improvements they are planning in mongodb 2.1 according to jira bug.

回复收藏 0 原文

变身佩奇 2024-12-15 15:24:34

您可以确保索引确实被使用，而无需任何磁盘访问。

假设您想要计算名称为“Andrei”的记录：

您确保名称上的索引（正如您所做的那样）
您

db.users.find({name:"andrei"}, {_id:0, name:1}).count()

检查它是否是最快的计数方式（预计算除外）

db.users.find({name:"andrei"}, {_id:0, name:1}).explain()

可以通过检查是否显示设置为 true 的 index_only 字段来

。这个技巧将确保您的查询仅从内存（索引）而不是从磁盘检索记录。

You can ensure that the index is really used without any disk access.

Let's say you want to count records with name : "Andrei"

You ensure index on name (as you've done)
and

db.users.find({name:"andrei"}, {_id:0, name:1}).count()

you can check that it is the fastest way to count (except with precomputing) by checking if

db.users.find({name:"andrei"}, {_id:0, name:1}).explain()

displays a index_only field set to true.

This trick will ensure that your query will retrieve records only from ram (index) and not from disk.

回复收藏 0 原文

黑凤梨 2024-12-15 15:24:34

你现在运气不太好，mongodb 中的 count 很糟糕，并且在不久的将来不会变得更好。请参阅：https://jira.mongodb.org/browse/SERVER-1752

根据经验，您几乎不应该使用它，除非它是一次性的事情，很少发生的事情，或者您的数据库非常小。

正如 @Andrew Orsich 所说，尽可能使用计数器（计数器的缺点是全局写锁，但无论如何都比 count() 更好）。

回复收藏 0 原文

煮酒 2024-12-15 15:24:34

对我来说，解决方案是将索引更改为稀疏。
这个要看具体情况，可以的话就试试吧。

db.Account.createIndex( { "date_checked_1": 1 }, { sparse: true } )

db.Account.find({    
     "dateChecked" : { $exists : true }    
}).count()

集合中 318,000 条记录

0.31 秒 - 使用稀疏索引
0.79 秒 - 使用非稀疏索引

For me the solution was change index to sparse.
It depend on specific situation, just give it a try if you can.

db.Account.createIndex( { "date_checked_1": 1 }, { sparse: true } )

db.Account.find({    
     "dateChecked" : { $exists : true }    
}).count()

318 thousands records in collection

0.31 sec - with sparse index
0.79 sec - with non-sparse index

回复收藏 0 原文

饮湿 2024-12-15 15:24:34

根据最新版本的 mongodb 4.4 添加我的观察。我的集合大小为 0.80 TB。

我已经为我的集合创建了一个索引 (UserObject.CountryID)。并运行此查询。

db.users.aggregate([
{
    $match : {
        "UserObject.CountryID" : 3
    }
}]).group({_id: "Count", count: {$sum: 1}})

总共花费了

搜索 0.80 TB 集合大小时，获取约 1300 万（1300 万）条记录
06800 毫秒。检索 0.80 TB 集合大小的约 3500 万（3500 万）条记录需要 16274 毫秒。
检索 0.80 TB 集合大小的约 4200 万（4200 万）条记录需要 41615 毫秒。

Adding my observations based on latest version of mongodb 4.4. I have 0.80 TB collection size.

I have created an index (UserObject.CountryID) for my collection. and ran this query.

db.users.aggregate([
{
    $match : {
        "UserObject.CountryID" : 3
    }
}]).group({_id: "Count", count: {$sum: 1}})

It took total

06800 ms to fetch count of around 13 million (1.3 crore) records searching 0.80 TB collection size.
16274 ms to fetch count of around 35 million (3.5 crore) records searching 0.80 TB collection size.
41615 ms to fetch count of around 42 million (4.2 crore) records searching 0.80 TB collection size.