在生产中重新启动/自动修复 Mongodb

发布于 2024-12-18 06:00:09 字数 1800 浏览 1 评论 0原文

我想要实现的是拥有一个 /etc/init.d 脚本,它可以更可靠地启动 Mongodb,即使它出现故障——它应该在系统处于锁定状态时尝试自动修复。

是的,我可以自己编写脚本,但我认为肯定有人已经这样做了。

我注意到,在服务器严重故障后,Mongodb 处于不会通过 /etc/init.d/mongod 脚本重新启动的状态。显然,需要删除锁定文件,并且需要先使用 --repair 选项启动并更正 --dbpath,然后才能成功重新启动。在某些情况下,还需要将 db 文件的所有权更改为运行 mongodb 的用户。另一个问题是,标准 /etc/init.d/mongod 脚本在这种情况下不会报告失败,而是愉快地错误地返回“OK”状态,报告 Mongod 已启动,尽管事实并非如此。

$ sudo /etc/init.d/mongod start
Starting mongod: forked process: 9220
all output going to: /data/mongo/log/mongod.log
                                                           [  OK  ]
$ sudo /etc/init.d/mongod status
mongod dead but subsys locked

操作系统是 CentOS 或 Fedora。

是否有人修改了 /etc/init.d 脚本或指向此类脚本的指针,在这种情况下尝试自动修复? 或者是否有其他工具可以充当 Mongod 的看门狗?

对于为什么尝试自动修复 mongodb 可能是一个坏主意有什么看法吗?

$ sudo /etc/init.d/mongod status
mongod dead but subsys locked

$ sudo ls -l /var/lib/mongo/mongod.lock 
-rw-r--r--. 1 mongod mongod 5 Nov 19 11:52 /var/lib/mongo/mongod.lock


$ sudo tail -50 /data/mongo/log/mongod.log
************** 
old lock file: /data/mongo/db/mongod.lock.  probably means unclean shutdown
recommend removing file and running --repair
see: http://dochub.mongodb.org/core/repair for more information
*************
Sat Nov 19 11:55:44 exception in initAndListen std::exception: old lock file, terminating
Sat Nov 19 11:55:44 dbexit: 

Sat Nov 19 11:55:44 shutdown: going to close listening sockets...
Sat Nov 19 11:55:44 shutdown: going to flush oplog...
Sat Nov 19 11:55:44 shutdown: going to close sockets...
Sat Nov 19 11:55:44 shutdown: waiting for fs preallocator...
Sat Nov 19 11:55:44 shutdown: closing all files...
Sat Nov 19 11:55:44     closeAllFiles() finished

Sat Nov 19 11:55:44 dbexit: really exiting now

What I want to achieve is to have an /etc/init.d script which more reliably starts Mongodb, even if it went down hard -- it should attempt an auto-repair in case the system is in a locked state.

Yes, I could script this myself, but I think somebody out there must have done this already.

I noticed that after a server goes down hard, that Mongodb is in a state where it doesn't restart via the /etc/init.d/mongod script. Obviously the lock file(s) need to be removed and it needs to be started with the --repair option and correct --dbpath first, before it can be successfully restarted. In some cases one also needs to change the ownership of the db-files to the user who runs mongodb. One additional problem is that the standard /etc/init.d/mongod script does not report a failure in this situation, but rather joyfully and incorrectly returns with "OK" status, reporting that Mongod was started, although it wasn't.

$ sudo /etc/init.d/mongod start
Starting mongod: forked process: 9220
all output going to: /data/mongo/log/mongod.log
                                                           [  OK  ]
$ sudo /etc/init.d/mongod status
mongod dead but subsys locked

The OS is either CentOS or Fedora.

Does anybody have modified /etc/init.d scripts or a pointer to such scripts, which attempt a repair automatically in that situation? Or is there another tool which functions as a watch dog for Mongod?

Any opinions on why it might be a bad idea to try to automatically repair mongodb?

$ sudo /etc/init.d/mongod status
mongod dead but subsys locked

$ sudo ls -l /var/lib/mongo/mongod.lock 
-rw-r--r--. 1 mongod mongod 5 Nov 19 11:52 /var/lib/mongo/mongod.lock


$ sudo tail -50 /data/mongo/log/mongod.log
************** 
old lock file: /data/mongo/db/mongod.lock.  probably means unclean shutdown
recommend removing file and running --repair
see: http://dochub.mongodb.org/core/repair for more information
*************
Sat Nov 19 11:55:44 exception in initAndListen std::exception: old lock file, terminating
Sat Nov 19 11:55:44 dbexit: 

Sat Nov 19 11:55:44 shutdown: going to close listening sockets...
Sat Nov 19 11:55:44 shutdown: going to flush oplog...
Sat Nov 19 11:55:44 shutdown: going to close sockets...
Sat Nov 19 11:55:44 shutdown: waiting for fs preallocator...
Sat Nov 19 11:55:44 shutdown: closing all files...
Sat Nov 19 11:55:44     closeAllFiles() finished

Sat Nov 19 11:55:44 dbexit: really exiting now

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

水溶 2024-12-25 06:00:09

所以首先要提到的是日记。日记实际上被称为“快速修复”。日志记录在 2.0+ 中默认处于启用状态,并且默认情况下会执行“修复”。

因此,如果您的磁盘可以处理日志记录的额外写入吞吐量,这可能会解决您的问题。

对于为什么尝试自动修复 mongodb 可能是一个坏主意有什么看法吗?

自动修复 MongoDB 的第一大问题只是时间问题之一。

如果您有 200GB 的数据库,系统在修复时需要执行以下操作:

  1. 分配 ~200GB 的文件(您有驱动器空间吗?
  2. 将现有文件中的所有数据读入内存(200GB 读取
  3. 检查每个文档的有效性并将其写回新文件(200GB 写入
  4. 重新创建所有索引(200GB 读取 + 大量写入
  5. 将所有内容刷新到磁盘

如果您查看我的笔记,就会发现执行修复需要大量的驱动器抖动。

但大多数生产安装都运行副本集。在这种情况下,您可以直接从备份中恢复,而不是进行修复。从备份恢复只会写入一次数据,这是您应该已经准备好的过程。

尽管 init.d 脚本返回 OK,但您的系统监控应该告诉您数据库尚未启动。

So the first bit to mention is journaling. Journaling is effectively billed as "fast repair". Journaling is on by default in 2.0+ and it will perform that "repair" by default.

So if your disks can handle the extra write-throughput of journaling this may solve your problem.

Any opinions on why it might be a bad idea to try to automatically repair mongodb?

The #1 issue with repairing MongoDB automatically is simply one of time.

If you have a 200GB database, the system will need to do the following when repairing:

  1. Allocate ~200GB of files (do you have the drive space?)
  2. Read all of the data from the existing files into memory (200GB read)
  3. Check each document for validity and write it back to the new files (200GB write)
  4. Re-create all indexes (200GB reads + large number of writes)
  5. Flush everything to disk

If you look at my notes that's a serious amount of drive thrashing to perform a repair.

But most production installs are running replica sets. In this case, instead of repairing, you can just restore from a backup. Restoring from a backup only writes the data once and it's a process you should already have in place.

Despite the init.d script returning OK, your system monitoring should tell you that the DB is not up.

遗弃M 2024-12-25 06:00:09

只是想指出日志功能在 32 位版本中确实可以工作。但是,在 32 位中默认情况下它是不打开的。

Just want to point out that journaling does work in the 32-bit version. However, it is not on by default in 32-bit.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文