使用 100% IO 监控并杀死失控进程?
我有一些必须以高优先级运行的进程(chrt 98),它们偶尔会决定硬锁定并以 100% 固定 1 个核心(没什么大不了的),但更重要的是,它将使用系统上的所有 IO ,以至于不可能通过 ssh 登录到机器来杀死它或在未加载到 RAM 的机器上执行任何任务。如果我碰巧已经运行了像 htop 这样的东西,我就可以很好地结束该过程。是否有任何类型的实用程序/方法可以监视此类失控进程并杀死任何使用 100% 系统 IO 时间超过 X 时间的进程?谢谢!
i have a few processes that have to be run at high priority (chrt 98) that will occasionally decide to hard-lock and peg 1 core at 100% (not a huge deal) but more importantly it will use all the IO on a system, so much that its impossible to log into the machine via ssh to kill it or perform any task on the machine that isn't loaded into ram. If i happen to have something like htop already running i am able to end the process fine. Is there any type of utility/way to monitor for this type of runaway process and kill anything that uses 100% of system IO for more than X amount of time? Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
不能以
nice
启动程序(并且优先级较低)吗?这样至少你应该能够通过 ssh 进入盒子并轻松杀死它。更好的解决方案当然是修复违规进程的行为(需要详细信息)。
此 serverfault 线程 似乎也包含您所要求的内容专门为.
Can't you start the program with
nice
(and with a lower priority)? This way at least you should be able to ssh into the box and kill it easily.The better solution would off course be to fix the behaviour of the offending process (details needed).
This serverfault thread also seems to contain what you ask for specifically.
假设应用程序消耗的是磁盘 IO,您可以将其访问的文件系统移动到单独的磁盘上吗?这样,您将在安装操作系统的磁盘上有空闲的 IO,并且应该能够登录和管理(即杀死!)进程。
Assuming that it's disk IO that the app is consuming, can you just move the filesystems it's accessing onto separate disks? That way you'll have IO to spare on the disks which the OS is installed on, and should be able to log in and manage (i.e. kill!) the process.
正如另一位发帖者所说,使用
nice
运行进程是正确的方法,但您确实提到您希望以高优先级运行它,这很奇怪......请注意,如果您'以最高优先级运行一个进程并且它已被锁定,您的监控系统甚至可能无法杀死它,除非您的监视器仍然处于更高的优先级。无论如何......god 以及其他几个进程管理工具,可以轻松杀死一个进程,如果它是以多种方式行为不当..配置看起来像这样 - 您以特定的时间间隔设置检查,然后您可以说“经过五次检查,如果 CPU 使用率始终高于 98%,则对其进行核攻击”:
另一个不同的做法是,您可能会看看 runit 系统中的
chpst
- 它允许您优雅地设置事物的界限(但对于 CPU 限制,nice
仍然是我可以使用的工具首先)。As another poster said, running your process with
nice
is the way to go, but you did mention that you want to run it at a high priority, which is odd... be aware that if you're running a process at the highest priority and it's pegged, your monitoring system might not even be able to kill it, unless your monitor is at a higher priority still. Anyway....god, as well as several other process managment tools, can easily kill a process if it's misbehaving in any of several ways.. config looks like this - you set checks at a particular interval, and then you can say "after five checks, nuke it if it's been above 98% CPU usage consistently":
Another, different take that you might have a look at is
chpst
from the runit system - it allows you to elegantly set bounds on things (but for CPU limiting,nice
is still the tool I'd reach for first).