在生产中重新启动/自动修复 Mongodb
我想要实现的是拥有一个 /etc/init.d 脚本,它可以更可靠地启动 Mongodb,即使它出现故障——它应该在系统处于锁定状态时尝试自动修复。
是的,我可以自己编写脚本,但我认为肯定有人已经这样做了。
我注意到,在服务器严重故障后,Mongodb 处于不会通过 /etc/init.d/mongod 脚本重新启动的状态。显然,需要删除锁定文件,并且需要先使用 --repair 选项启动并更正 --dbpath,然后才能成功重新启动。在某些情况下,还需要将 db 文件的所有权更改为运行 mongodb 的用户。另一个问题是,标准 /etc/init.d/mongod 脚本在这种情况下不会报告失败,而是愉快地错误地返回“OK”状态,报告 Mongod 已启动,尽管事实并非如此。
$ sudo /etc/init.d/mongod start
Starting mongod: forked process: 9220
all output going to: /data/mongo/log/mongod.log
[ OK ]
$ sudo /etc/init.d/mongod status
mongod dead but subsys locked
操作系统是 CentOS 或 Fedora。
是否有人修改了 /etc/init.d 脚本或指向此类脚本的指针,在这种情况下尝试自动修复? 或者是否有其他工具可以充当 Mongod 的看门狗?
对于为什么尝试自动修复 mongodb 可能是一个坏主意有什么看法吗?
$ sudo /etc/init.d/mongod status
mongod dead but subsys locked
$ sudo ls -l /var/lib/mongo/mongod.lock
-rw-r--r--. 1 mongod mongod 5 Nov 19 11:52 /var/lib/mongo/mongod.lock
$ sudo tail -50 /data/mongo/log/mongod.log
**************
old lock file: /data/mongo/db/mongod.lock. probably means unclean shutdown
recommend removing file and running --repair
see: http://dochub.mongodb.org/core/repair for more information
*************
Sat Nov 19 11:55:44 exception in initAndListen std::exception: old lock file, terminating
Sat Nov 19 11:55:44 dbexit:
Sat Nov 19 11:55:44 shutdown: going to close listening sockets...
Sat Nov 19 11:55:44 shutdown: going to flush oplog...
Sat Nov 19 11:55:44 shutdown: going to close sockets...
Sat Nov 19 11:55:44 shutdown: waiting for fs preallocator...
Sat Nov 19 11:55:44 shutdown: closing all files...
Sat Nov 19 11:55:44 closeAllFiles() finished
Sat Nov 19 11:55:44 dbexit: really exiting now
What I want to achieve is to have an /etc/init.d script which more reliably starts Mongodb, even if it went down hard -- it should attempt an auto-repair in case the system is in a locked state.
Yes, I could script this myself, but I think somebody out there must have done this already.
I noticed that after a server goes down hard, that Mongodb is in a state where it doesn't restart via the /etc/init.d/mongod script. Obviously the lock file(s) need to be removed and it needs to be started with the --repair option and correct --dbpath first, before it can be successfully restarted. In some cases one also needs to change the ownership of the db-files to the user who runs mongodb. One additional problem is that the standard /etc/init.d/mongod script does not report a failure in this situation, but rather joyfully and incorrectly returns with "OK" status, reporting that Mongod was started, although it wasn't.
$ sudo /etc/init.d/mongod start
Starting mongod: forked process: 9220
all output going to: /data/mongo/log/mongod.log
[ OK ]
$ sudo /etc/init.d/mongod status
mongod dead but subsys locked
The OS is either CentOS or Fedora.
Does anybody have modified /etc/init.d scripts or a pointer to such scripts, which attempt a repair automatically in that situation? Or is there another tool which functions as a watch dog for Mongod?
Any opinions on why it might be a bad idea to try to automatically repair mongodb?
$ sudo /etc/init.d/mongod status
mongod dead but subsys locked
$ sudo ls -l /var/lib/mongo/mongod.lock
-rw-r--r--. 1 mongod mongod 5 Nov 19 11:52 /var/lib/mongo/mongod.lock
$ sudo tail -50 /data/mongo/log/mongod.log
**************
old lock file: /data/mongo/db/mongod.lock. probably means unclean shutdown
recommend removing file and running --repair
see: http://dochub.mongodb.org/core/repair for more information
*************
Sat Nov 19 11:55:44 exception in initAndListen std::exception: old lock file, terminating
Sat Nov 19 11:55:44 dbexit:
Sat Nov 19 11:55:44 shutdown: going to close listening sockets...
Sat Nov 19 11:55:44 shutdown: going to flush oplog...
Sat Nov 19 11:55:44 shutdown: going to close sockets...
Sat Nov 19 11:55:44 shutdown: waiting for fs preallocator...
Sat Nov 19 11:55:44 shutdown: closing all files...
Sat Nov 19 11:55:44 closeAllFiles() finished
Sat Nov 19 11:55:44 dbexit: really exiting now
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
所以首先要提到的是日记。日记实际上被称为“快速修复”。日志记录在 2.0+ 中默认处于启用状态,并且默认情况下会执行“修复”。
因此,如果您的磁盘可以处理日志记录的额外写入吞吐量,这可能会解决您的问题。
自动修复 MongoDB 的第一大问题只是时间问题之一。
如果您有 200GB 的数据库,系统在修复时需要执行以下操作:
200GB 读取
)200GB 写入
)200GB 读取 + 大量写入
)如果您查看我的笔记,就会发现执行修复需要大量的驱动器抖动。
但大多数生产安装都运行副本集。在这种情况下,您可以直接从备份中恢复,而不是进行修复。从备份恢复只会写入一次数据,这是您应该已经准备好的过程。
尽管
init.d
脚本返回OK
,但您的系统监控应该告诉您数据库尚未启动。So the first bit to mention is journaling. Journaling is effectively billed as "fast repair". Journaling is on by default in 2.0+ and it will perform that "repair" by default.
So if your disks can handle the extra write-throughput of journaling this may solve your problem.
The #1 issue with repairing MongoDB automatically is simply one of time.
If you have a 200GB database, the system will need to do the following when repairing:
200GB read
)200GB write
)200GB reads + large number of writes
)If you look at my notes that's a serious amount of drive thrashing to perform a repair.
But most production installs are running replica sets. In this case, instead of repairing, you can just restore from a backup. Restoring from a backup only writes the data once and it's a process you should already have in place.
Despite the
init.d
script returningOK
, your system monitoring should tell you that the DB is not up.只是想指出日志功能在 32 位版本中确实可以工作。但是,在 32 位中默认情况下它是不打开的。
Just want to point out that journaling does work in the 32-bit version. However, it is not on by default in 32-bit.