MongoDB写入缓慢导致socket超时异常
我在使用 MongoDB 时遇到性能问题。
运行于:
- MongoDB 2.0.1
- Windows 2008 R2
- 12 GB RAM
- 2 TB HDD (5400 rpm)
我编写了一个异步删除和插入记录的守护程序。每小时大部分集合都会被清除,并且它们将获得新的插入数据(10-1200 万个删除和 10-1200 万个插入)。守护进程在插入数据时使用约 60-80 的 CPU(由于计算 1+ 百万背包问题)。当我启动守护进程时,它可以完成大约 1-2 分钟的工作,直到由于套接字超时(将数据写入 MongoDB 服务器)而崩溃。
当我查看日志时,我发现删除集合中的数据大约需要 30 秒。看起来它与 CPU 负载和内存使用有关......,因为当我在另一台 PC 上运行守护程序时一切正常。
是否有任何可能的优化,或者我只是必须使用单独的 PC 来运行守护程序(或选择另一个文档存储)?
更新 11/13/2011 18:44 GMT+1
仍然有问题..我对我的守护进程做了一些修改。我减少了并发写入数。然而,当内存变满(11.8GB 或 12GB)并接收更多负载(将数据加载到前端)时,守护进程仍然崩溃。由于 MongoDB 的插入/删除时间较长(30 秒),它会崩溃。 守护进程崩溃是因为MongoDB响应慢(socket超时异常)。当然应该有try/catch语句来捕获此类异常,但它首先不应该发生。我正在寻找解决此问题的解决方案,而不是解决它。
- 总存储大小为:8,1 GB
- 索引大小为:2,1 GB
我猜问题在于工作集+索引太大而无法存储在内存中,并且MongoDB需要访问HDD(速度很慢5400转) ..但是为什么这会成为一个问题呢?是否有其他策略来存储集合(例如,在单独的文件中而不是 2GB 的大块中)。如果关系数据库可以在可接受的时间内从磁盘读取/写入数据,为什么 MongoDB 不能?
更新 11/15/2011 00:04 GMT+1
说明问题的日志文件:
00:02:46 [conn3] insert bargains.auction-history-eu-bloodhoof-horde 421ms
00:02:47 [conn6] insert bargains.auction-history-eu-blackhand-horde 1357ms
00:02:48 [conn3] insert bargains.auction-history-eu-bloodhoof-alliance 577ms
00:02:48 [conn6] insert bargains.auction-history-eu-blackhand-alliance 499ms
00:02:49 [conn4] remove bargains.crafts-eu-agamaggan-horde 34881ms
00:02:49 [conn5] remove bargains.crafts-eu-aggramar-horde 3135ms
00:02:49 [conn5] insert bargains.crafts-eu-aggramar-horde 234ms
00:02:50 [conn2] remove bargains.auctions-eu-aerie-peak-horde 36223ms
00:02:52 [conn5] remove bargains.auctions-eu-aegwynn-horde 1700ms
更新 11/18/2011 10:41 GMT+1 在 mongodb 用户组中发布此问题后,我们发现没有发出“drop”。删除比完全删除所有记录要快得多。
我正在使用官方 mongodb-csharp-driver。我发出了这个命令collection.Drop();
。但是它不起作用,所以暂时我使用了这个:
public void Clear()
{
if (collection.Exists())
{
var command = new CommandDocument {
{ "drop", collectionName }
};
collection.Database.RunCommand(command);
}
}
守护进程现在相当稳定,但我必须找出为什么 collection.Drop()
方法不起作用应该是这样,因为驱动程序也使用本机 drop 命令。
I am having performance issues with MongoDB.
Running on:
- MongoDB 2.0.1
- Windows 2008 R2
- 12 GB RAM
- 2 TB HDD (5400 rpm)
I've written a daemon which removes and inserts records async. Each hour most of the collections are cleared and they'll get new inserted data (10-12 million deletes and 10-12 million inserts). The daemon uses ~60-80 of the CPU while inserting the data (due calculating 1+ million knapsack problems). When I fire up the daemon it can do it's job about 1-2 mins till it crashes due a socket time out (writing data to the MongoDB server).
When I look in the logs I see it takes about 30 seconds to remove data in the collection. It seems it has something to do with the CPU load and memory usage.., because when I run the daemon on a different PC everything goes fine.
Is there any optimization possible or I am just bound to using a separate PC for running the daemon (or pick another document store)?
UPDATE 11/13/2011 18:44 GMT+1
Still having problems.. I've made some modifications to my daemon. I've decreased the concurrent number of writes. However the daemon still crashes when the memory is getting full (11.8GB of 12GB) and receives more load (loading data into the frontend). It crashes due a long insert/remove of MongoDB(30 seconds). The crash of the daemon is because of MongoDB is responding slow (socket time out exception). Ofcourse there should be try/catch statements to catch such exceptions, but it should not happen in the first place. I'm looking for a solution to solve this issue instead of working around it.
- Total storage size is: 8,1 GB
- Index size is: 2,1 GB
I guess the problem lies in that the working set + indexes are too large to store in memory and MongoDB needs to access the HDD (which is slow 5400 rpm).. However why would this be a problem? Aren't there other strategies to store the collections (e.g. in seperate files instead of large chunks of 2GB). If an Relational database can read/write data in an acceptable amount of time from the disk, why can't MongoDB?
UPDATE 11/15/2011 00:04 GMT+1
Log file to illustrate the issue:
00:02:46 [conn3] insert bargains.auction-history-eu-bloodhoof-horde 421ms
00:02:47 [conn6] insert bargains.auction-history-eu-blackhand-horde 1357ms
00:02:48 [conn3] insert bargains.auction-history-eu-bloodhoof-alliance 577ms
00:02:48 [conn6] insert bargains.auction-history-eu-blackhand-alliance 499ms
00:02:49 [conn4] remove bargains.crafts-eu-agamaggan-horde 34881ms
00:02:49 [conn5] remove bargains.crafts-eu-aggramar-horde 3135ms
00:02:49 [conn5] insert bargains.crafts-eu-aggramar-horde 234ms
00:02:50 [conn2] remove bargains.auctions-eu-aerie-peak-horde 36223ms
00:02:52 [conn5] remove bargains.auctions-eu-aegwynn-horde 1700ms
UPDATE 11/18/2011 10:41 GMT+1
After posting this issue in the mongodb usergroup we found out that "drop" wasn't issued. Drop is much faster then a full remove of all records.
I am using official mongodb-csharp-driver. I issued this command collection.Drop();
. However It didn't work, so for the time being I used this:
public void Clear()
{
if (collection.Exists())
{
var command = new CommandDocument {
{ "drop", collectionName }
};
collection.Database.RunCommand(command);
}
}
The daemon is quite stable now, yet I have to find out why the collection.Drop()
method doesn't work as it supposed to, since the driver uses the native drop command aswell.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
可能可以进行一些优化:
确保您的 mongodb 没有在
verbose
模式下运行,这将确保最少的日志记录,从而保证最少的 I/O 。否则,它将每个操作写入日志文件。如果应用程序逻辑可能,请将插入转换为批量插入。大多数 mongodb 驱动程序都支持批量插入。
http://www.mongodb.org/display/DOCS/Inserting#Inserting-Bulkinserts
不要对每条记录进行一次删除操作,而是尝试批量删除。
例如。收集 1000 个文档的“_id”,然后使用
$in
运算符触发删除查询。您对 mongoDb 的查询将减少 1000 倍。
如果您要删除/插入同一文档以刷新数据,请尝试考虑更新。
你正在运行什么样的守护进程?如果您可以分享更多相关信息,那么也可以对其进行优化以减少 CPU 负载。
Some optimizations may be possible:
Make sure your mongodb is not running in
verbose
mode, this will ensure minimal logging and hence minimal I/O . Else it writes every operation to a log file.If possible by application logic, convert your inserts to bulk inserts.Bulk insert is supported in most mongodb drivers.
http://www.mongodb.org/display/DOCS/Inserting#Inserting-Bulkinserts
Instead of one remove operation per record, try to remove in bulk.
eg. collect "_id" of 1000 documents, then fire a remove query using
$in
operator.You will have 1000 times less queries to mongoDb.
If you are removing/inserting for same document to refresh data, try considering an update instead.
What kind of deamon are you running ? If you can share more info on that,it may be possible to optimize that too to reduce CPU load.
这可能是完全不相关的,但 2.0.0 中存在一个与 CPU 消耗有关的问题。 升级到2.0.0后,mongo开始消耗锁定系统的所有CPU资源,抱怨内存泄漏
It could be totally unrelated, but there was an issue in 2.0.0 that had to do with CPU consumption. after upgrade to 2.0.0 mongo starts consuming all cpu resources locking the system, complains of memory leak
除非我误解了,否则您的应用程序正在崩溃,而不是
mongod
。您是否尝试过从图片中删除 MongoDB 并将对 MongoDB 的写入替换为对文件系统的写入?也许这会揭示应用程序中与 MongoDB 无关的其他问题。
Unless I have misunderstood, your application is crashing, not
mongod
. Have you tried to remove MongoDB from the picture and replacing writes to MongoDB with perhaps writes to the file system?Maybe this will bring light to some other issue inside your application that is not related specifically to MongoDB.
我在 Windows Server 2008 R2 上的 SQL Server 2008 上也发生过类似的情况。对我来说,它最终是网卡。 NIC 设置为自动检测连接速度,这会导致偶尔丢弃/丢失数据包,从而导致套接字超时问题。要进行测试,您可以从本地工作站 ping 盒子并启动加载 Windows 2008 R2 服务器的过程。如果是这个问题,最终您将开始看到 ping 命令超时 解决方案
最终是显式设置 NIC 连接速度
管理计算机>设备管理器>网络适配器>属性,然后根据网卡的不同,您将有一个链接速度设置选项卡或必须进入另一个菜单。您需要将其设置为它所连接的网络的速度。在我的 DEV 环境中,它最终达到 100Mbps 半双工。
如您所知,这些类型的问题查找起来非常痛苦!
最好让你弄清楚。
I had something similar happen with SQL Server 2008 on Windows Server 2008 R2. For me, it ended up being the network card. The NIC was set to auto-sense the connection speed which was leading to occasional dropped/lost packets which was leading to the socket timeout problems. To test you can ping the box from your local workstation and kick off your process to load the Windows 2008 R2 server. If it is this problem eventually you'll start to see the timeouts on your ping command
The solution ended up being to explicitly set the NIC connection speed
Manage Computer > Device Manager > Network Adapters > Properties and then depending on the nic you'll have either a link speed setting tab or have to go into another menu. You'll want to set this to exactly the speed of the network it is connected to. In my DEV environment it ended up being 100Mbps Half duplex.
These types of problems, as you know, can be a real pain to track down!
Best to you in figuring it out.
守护进程现在稳定了,在 mongodb 用户组中发布此问题后,我们发现没有发出“drop”。删除比完全删除所有记录要快得多。
The daemon is stable now, After posting this issue in the mongodb usergroup we found out that "drop" wasn't issued. Drop is much faster then a full remove of all records.