如何在monit中结合进程和文件检查?
摘要
如何在 Monit 中组合多项检查?我想检查进程活动和文件内容/时间戳。
冗长而无聊的解释
我正在开发一个 Monit 守护进程来保持我的 Bukkit Minecraft 服务器正常运行。它会进行多项检查。目前我有这段代码:
#!monit
check process bukkit pidfile /var/run/bukkit.pid # check if the java process is running
start program = "/sbin/start bukkit" # start with Upstart
stop program = "/sbin/stop bukkit" # stop with Upstart
if failed # send a noop request to check if the server responses
host cubixcraft.de port 20059 protocol http
and request "/api/call?method=runConsoleCommand&args=%5B%22noop%22%5D&key=d9c7f3f6be0c92c1b2725f0e5a3352514cee0885c3bf7e0189a76bbaf2f4d7a7"
with checksum e006695c8da58e03f17a305afd1a1a32
timeout 20 seconds for 2 cycles
then restart # restart if it fails
它可以工作......但速度很慢。如果出现问题,我必须等待 20 秒,直到服务器终止。但我需要这个超时,因为服务器会不时地进行一些重新加载(刷新配置、清理内存等),这会产生一点延迟。如果没有超时 20 秒持续 2 个周期
,服务器在重新加载时将立即终止。
好吧,如果确实出了问题,我等待 20 秒直到服务器重新启动是没有问题的。但大多数时候(当出现问题时)服务器上的所有安全机制都会停止工作。
因此,我需要找到一种方法,在服务器没有响应时立即重新启动服务器,但在重新加载时给它一些时间。
我有这种方法:当发出任何命令(包括重新加载和我用来检查服务器状态的 API 调用)时,服务器会将某些内容写入日志文件。所以日志文件的时间戳就是最后一个命令的时间戳。重新加载期间,不会将任何内容写入文件。因此,我可以通过简单的时间戳检查来检测重新加载,并且只有当服务器当前重新加载时,我才给它 20 秒的时间。
Summary
How can I combine multiple checks in Monit? I want to check against process activity and file content/timestamp.
Long and boring explanation
I'm working on a Monit daemon for keeping my Bukkit Minecraft server up. It does several checks. At the moment I have this code:
#!monit
check process bukkit pidfile /var/run/bukkit.pid # check if the java process is running
start program = "/sbin/start bukkit" # start with Upstart
stop program = "/sbin/stop bukkit" # stop with Upstart
if failed # send a noop request to check if the server responses
host cubixcraft.de port 20059 protocol http
and request "/api/call?method=runConsoleCommand&args=%5B%22noop%22%5D&key=d9c7f3f6be0c92c1b2725f0e5a3352514cee0885c3bf7e0189a76bbaf2f4d7a7"
with checksum e006695c8da58e03f17a305afd1a1a32
timeout 20 seconds for 2 cycles
then restart # restart if it fails
It works... but it's slow. I have to wait 20 seconds until the server gets terminated if something went wrong. But I need that timeout because the server does some reloads (to refresh the configuration, clean the memory, etc.) from time to time which produce little lags. Without the timeout 20 seconds for 2 cycles
the server would be terminated immedeately if it reloads.
Okay, it's no problem for me to wait 20 seconds until the server gets restarted if something really went wrong. But most of the time (when something goes wrong) all security mechanisms on the server quit working.
And because of that I need to find a way to restart the server immedeatly if it doesn't response, but give it some time, when it reloads.
I have this approach: The server writes something to a logfile, when any command (including reloads and API calls which I use to check the server status) is issued. So the timestamp of the logfile is the timestamp of the last command. During a reload nothing gets written to the file. So I can detect a reload with a simple timestamp check and only if the server currently reloads I give it its 20 seconds.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我设法通过覆盖启动程序来做到这一点:
这在
monit/5.5
中工作,但在monit/5.14
中,它仅有时有效。由于monit/5.14
在start
程序时接收到unmonitor
,因此它会在实际运行之前等待start
完成执行unmonitor
这意味着monitor
触发得太早并被拒绝。i managed to do this by overriding start program:
this was working in
monit/5.5
but inmonit/5.14
, it only works sometimes. sincemonit/5.14
receives theunmonitor
while it'sstart
ing the program, it waits forstart
to finish before actually doing theunmonitor
which means themonitor
fires too early and gets rejected.