MongoDB“count()”非常慢。我们如何改进/解决它?
我目前正在使用 MongoDB 来处理数百万条数据记录。我发现了一件非常烦人的事情。
当我使用“count()”函数进行少量查询数据收集时,速度非常快。然而,当查询的数据集合包含数千甚至数百万条数据记录时,整个系统变得非常缓慢。
我确保已对所需字段建立索引。
有人遇到过同样的事情吗?你如何改善这一点?
I am currently using MongoDB with millions of data records. I discovered one thing that's pretty annoying.
When I use 'count()' function with a small number of queried data collection, it's very fast. However, when the queried data collection contains thousand or even millions of data records, the entire system becomes very slow.
I made sure that I have indexed the required fields.
Has anybody encountered an identical thing? How do you do to improve that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
现在除了创建适当的索引之外还有另一种优化。
如果您需要一些计数器,我建议尽可能预先计算它们。通过使用原子 $inc 操作而不是使用
count( {})
完全没有。但是 mongodb 人员在 mongodb 上努力工作,因此,他们计划根据 jira 错误。
There is now another optimization than create proper index.
If you need some counters i suggest to precalculate them whenever it possible. By using atomic $inc operation and not use
count({})
at all.But mongodb guys working hard on mongodb, so,
count({})
improvements they are planning in mongodb 2.1 according to jira bug.您可以确保索引确实被使用,而无需任何磁盘访问。
假设您想要计算名称为“Andrei”的记录:
您确保名称上的索引(正如您所做的那样)
您
检查它是否是最快的计数方式(预计算除外)
可以通过检查是否显示设置为 true 的 index_only 字段来
。这个技巧将确保您的查询仅从内存(索引)而不是从磁盘检索记录。
You can ensure that the index is really used without any disk access.
Let's say you want to count records with name : "Andrei"
You ensure index on name (as you've done)
and
you can check that it is the fastest way to count (except with precomputing) by checking if
displays a index_only field set to true.
This trick will ensure that your query will retrieve records only from ram (index) and not from disk.
你现在运气不太好,mongodb 中的 count 很糟糕,并且在不久的将来不会变得更好。请参阅:https://jira.mongodb.org/browse/SERVER-1752
根据经验,您几乎不应该使用它,除非它是一次性的事情,很少发生的事情,或者您的数据库非常小。
正如 @Andrew Orsich 所说,尽可能使用计数器(计数器的缺点是全局写锁,但无论如何都比 count() 更好)。
You are pretty much out of luck for now, count in mongodb is awful and won't be getting better in the near future. See: https://jira.mongodb.org/browse/SERVER-1752
From experience, you should pretty much never use it unless it's a one time thing, something that occurs very rarely, or your database is pretty small.
As @Andrew Orsich stated, use counters whenever possible (the downfall to counters is the global write lock, but better than count() regardless).
对我来说,解决方案是将索引更改为稀疏。
这个要看具体情况,可以的话就试试吧。
集合中 318,000 条记录
For me the solution was change index to sparse.
It depend on specific situation, just give it a try if you can.
318 thousands records in collection
根据最新版本的 mongodb 4.4 添加我的观察。我的集合大小为
0.80 TB
。我已经为我的集合创建了一个索引 (
UserObject.CountryID
)。并运行此查询。总共花费了
0.80 TB
集合大小时,获取约 1300 万(1300 万)条记录0.80 TB
集合大小的约 3500 万(3500 万)条记录需要 16274 毫秒。0.80 TB
集合大小的约 4200 万(4200 万)条记录需要 41615 毫秒。Adding my observations based on latest version of mongodb 4.4. I have
0.80 TB
collection size.I have created an index (
UserObject.CountryID
) for my collection. and ran this query.It took total
0.80 TB
collection size.0.80 TB
collection size.0.80 TB
collection size.