如何在monit中结合进程和文件检查?

发布于 2025-01-03 04:03:25 字数 1358 浏览 4 评论 0原文

摘要

如何在 Monit 中组合多项检查?我想检查进程活动和文件内容/时间戳。


冗长而无聊的解释

我正在开发一个 Monit 守护进程来保持我的 Bukkit Minecraft 服务器正常运行。它会进行多项检查。目前我有这段代码:

#!monit

check process bukkit pidfile /var/run/bukkit.pid # check if the java process is running
    start program = "/sbin/start bukkit"         # start with Upstart
    stop program  = "/sbin/stop bukkit"          # stop with Upstart

    if failed                                    # send a noop request to check if the server responses
        host cubixcraft.de port 20059 protocol http
        and request "/api/call?method=runConsoleCommand&args=%5B%22noop%22%5D&key=d9c7f3f6be0c92c1b2725f0e5a3352514cee0885c3bf7e0189a76bbaf2f4d7a7"
            with checksum e006695c8da58e03f17a305afd1a1a32
            timeout 20 seconds for 2 cycles
    then restart                                 # restart if it fails

它可以工作......但速度很慢。如果出现问题,我必须等待 20 秒,直到服务器终止。但我需要这个超时,因为服务器会不时地进行一些重新加载(刷新配置、清理内存等),这会产生一点延迟。如果没有超时 20 秒持续 2 个周期,服务器在重新加载时将立即终止。

好吧,如果确实出了问题,我等待 20 秒直到服务器重新启动是没有问题的。但大多数时候(当出现问题时)服务器上的所有安全机制都会停止工作。

因此,我需要找到一种方法,在服务器没有响应时立即重新启动服务器,但在重新加载时给它一些时间。

我有这种方法:当发出任何命令(包括重新加载和我用来检查服务器状态的 API 调用)时,服务器会将某些内容写入日志文件。所以日志文件的时间戳就是最后一个命令的时间戳。重新加载期间,不会将任何内容写入文件。因此,我可以通过简单的时间戳检查来检测重新加载,并且只有当服务器当前重新加载时,我才给它 20 秒的时间。

Summary

How can I combine multiple checks in Monit? I want to check against process activity and file content/timestamp.


Long and boring explanation

I'm working on a Monit daemon for keeping my Bukkit Minecraft server up. It does several checks. At the moment I have this code:

#!monit

check process bukkit pidfile /var/run/bukkit.pid # check if the java process is running
    start program = "/sbin/start bukkit"         # start with Upstart
    stop program  = "/sbin/stop bukkit"          # stop with Upstart

    if failed                                    # send a noop request to check if the server responses
        host cubixcraft.de port 20059 protocol http
        and request "/api/call?method=runConsoleCommand&args=%5B%22noop%22%5D&key=d9c7f3f6be0c92c1b2725f0e5a3352514cee0885c3bf7e0189a76bbaf2f4d7a7"
            with checksum e006695c8da58e03f17a305afd1a1a32
            timeout 20 seconds for 2 cycles
    then restart                                 # restart if it fails

It works... but it's slow. I have to wait 20 seconds until the server gets terminated if something went wrong. But I need that timeout because the server does some reloads (to refresh the configuration, clean the memory, etc.) from time to time which produce little lags. Without the timeout 20 seconds for 2 cycles the server would be terminated immedeately if it reloads.

Okay, it's no problem for me to wait 20 seconds until the server gets restarted if something really went wrong. But most of the time (when something goes wrong) all security mechanisms on the server quit working.

And because of that I need to find a way to restart the server immedeatly if it doesn't response, but give it some time, when it reloads.

I have this approach: The server writes something to a logfile, when any command (including reloads and API calls which I use to check the server status) is issued. So the timestamp of the logfile is the timestamp of the last command. During a reload nothing gets written to the file. So I can detect a reload with a simple timestamp check and only if the server currently reloads I give it its 20 seconds.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

_失温 2025-01-10 04:03:25

我设法通过覆盖启动程序来做到这一点:

start program = "/bin/bash -c '/usr/bin/monit unmonitor bukkit; /sbin/start bukkit; sleep 20; /usr/bin/monit monitor bukkit'" with timeout 25 seconds

这在 monit/5.5 中工作,但在 monit/5.14 中,它仅有时有效。由于 monit/5.14start程序时接收到unmonitor,因此它会在实际运行之前等待start完成执行 unmonitor 这意味着 monitor 触发得太早并被拒绝。

i managed to do this by overriding start program:

start program = "/bin/bash -c '/usr/bin/monit unmonitor bukkit; /sbin/start bukkit; sleep 20; /usr/bin/monit monitor bukkit'" with timeout 25 seconds

this was working in monit/5.5 but in monit/5.14, it only works sometimes. since monit/5.14 receives the unmonitor while it's starting the program, it waits for start to finish before actually doing the unmonitor which means the monitor fires too early and gets rejected.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文