在生产中重新启动/自动修复 Mongodb

发布于 2024-12-18 06:00:09 字数 1800 浏览 6 评论 0原文

我想要实现的是拥有一个 /etc/init.d 脚本，它可以更可靠地启动 Mongodb，即使它出现故障——它应该在系统处于锁定状态时尝试自动修复。

是的，我可以自己编写脚本，但我认为肯定有人已经这样做了。

我注意到，在服务器严重故障后，Mongodb 处于不会通过 /etc/init.d/mongod 脚本重新启动的状态。显然，需要删除锁定文件，并且需要先使用 --repair 选项启动并更正 --dbpath，然后才能成功重新启动。在某些情况下，还需要将 db 文件的所有权更改为运行 mongodb 的用户。另一个问题是，标准 /etc/init.d/mongod 脚本在这种情况下不会报告失败，而是愉快地错误地返回“OK”状态，报告 Mongod 已启动，尽管事实并非如此。

$ sudo /etc/init.d/mongod start
Starting mongod: forked process: 9220
all output going to: /data/mongo/log/mongod.log
                                                           [  OK  ]
$ sudo /etc/init.d/mongod status
mongod dead but subsys locked

操作系统是 CentOS 或 Fedora。

是否有人修改了 /etc/init.d 脚本或指向此类脚本的指针，在这种情况下尝试自动修复？ 或者是否有其他工具可以充当 Mongod 的看门狗？

对于为什么尝试自动修复 mongodb 可能是一个坏主意有什么看法吗？

$ sudo /etc/init.d/mongod status
mongod dead but subsys locked

$ sudo ls -l /var/lib/mongo/mongod.lock 
-rw-r--r--. 1 mongod mongod 5 Nov 19 11:52 /var/lib/mongo/mongod.lock


$ sudo tail -50 /data/mongo/log/mongod.log
************** 
old lock file: /data/mongo/db/mongod.lock.  probably means unclean shutdown
recommend removing file and running --repair
see: http://dochub.mongodb.org/core/repair for more information
*************
Sat Nov 19 11:55:44 exception in initAndListen std::exception: old lock file, terminating
Sat Nov 19 11:55:44 dbexit: 

Sat Nov 19 11:55:44 shutdown: going to close listening sockets...
Sat Nov 19 11:55:44 shutdown: going to flush oplog...
Sat Nov 19 11:55:44 shutdown: going to close sockets...
Sat Nov 19 11:55:44 shutdown: waiting for fs preallocator...
Sat Nov 19 11:55:44 shutdown: closing all files...
Sat Nov 19 11:55:44     closeAllFiles() finished

Sat Nov 19 11:55:44 dbexit: really exiting now

原文

What I want to achieve is to have an /etc/init.d script which more reliably starts Mongodb, even if it went down hard -- it should attempt an auto-repair in case the system is in a locked state.

Yes, I could script this myself, but I think somebody out there must have done this already.

I noticed that after a server goes down hard, that Mongodb is in a state where it doesn't restart via the /etc/init.d/mongod script. Obviously the lock file(s) need to be removed and it needs to be started with the --repair option and correct --dbpath first, before it can be successfully restarted. In some cases one also needs to change the ownership of the db-files to the user who runs mongodb. One additional problem is that the standard /etc/init.d/mongod script does not report a failure in this situation, but rather joyfully and incorrectly returns with "OK" status, reporting that Mongod was started, although it wasn't.

$ sudo /etc/init.d/mongod start
Starting mongod: forked process: 9220
all output going to: /data/mongo/log/mongod.log
                                                           [  OK  ]
$ sudo /etc/init.d/mongod status
mongod dead but subsys locked

The OS is either CentOS or Fedora.

Does anybody have modified /etc/init.d scripts or a pointer to such scripts, which attempt a repair automatically in that situation? Or is there another tool which functions as a watch dog for Mongod?

Any opinions on why it might be a bad idea to try to automatically repair mongodb?

$ sudo /etc/init.d/mongod status
mongod dead but subsys locked

$ sudo ls -l /var/lib/mongo/mongod.lock 
-rw-r--r--. 1 mongod mongod 5 Nov 19 11:52 /var/lib/mongo/mongod.lock


$ sudo tail -50 /data/mongo/log/mongod.log
************** 
old lock file: /data/mongo/db/mongod.lock.  probably means unclean shutdown
recommend removing file and running --repair
see: http://dochub.mongodb.org/core/repair for more information
*************
Sat Nov 19 11:55:44 exception in initAndListen std::exception: old lock file, terminating
Sat Nov 19 11:55:44 dbexit: 

Sat Nov 19 11:55:44 shutdown: going to close listening sockets...
Sat Nov 19 11:55:44 shutdown: going to flush oplog...
Sat Nov 19 11:55:44 shutdown: going to close sockets...
Sat Nov 19 11:55:44 shutdown: waiting for fs preallocator...
Sat Nov 19 11:55:44 shutdown: closing all files...
Sat Nov 19 11:55:44     closeAllFiles() finished

Sat Nov 19 11:55:44 dbexit: really exiting now

分享到QQ

分享到微博