使用 100% IO 监控并杀死失控进程？

发布于 2024-08-31 18:46:02 字数 231 浏览 18 评论 0原文

我有一些必须以高优先级运行的进程（chrt 98），它们偶尔会决定硬锁定并以 100% 固定 1 个核心（没什么大不了的），但更重要的是，它将使用系统上的所有 IO ，以至于不可能通过 ssh 登录到机器来杀死它或在未加载到 RAM 的机器上执行任何任务。如果我碰巧已经运行了像 htop 这样的东西，我就可以很好地结束该过程。是否有任何类型的实用程序/方法可以监视此类失控进程并杀死任何使用 100% 系统 IO 时间超过 X 时间的进程？谢谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

过度放纵 2024-09-07 18:46:02

不能以 nice 启动程序（并且优先级较低）吗？这样至少你应该能够通过 ssh 进入盒子并轻松杀死它。

更好的解决方案当然是修复违规进程的行为（需要详细信息）。

此 serverfault 线程似乎也包含您所要求的内容专门为.

回复收藏 0 原文

日暮斜阳 2024-09-07 18:46:02

假设应用程序消耗的是磁盘 IO，您可以将其访问的文件系统移动到单独的磁盘上吗？这样，您将在安装操作系统的磁盘上有空闲的 IO，并且应该能够登录和管理（即杀死！）进程。

回复收藏 0 原文

酒与心事 2024-09-07 18:46:02

正如另一位发帖者所说，使用 nice 运行进程是正确的方法，但您确实提到您希望以高优先级运行它，这很奇怪......请注意，如果您'以最高优先级运行一个进程并且它已被锁定，您的监控系统甚至可能无法杀死它，除非您的监视器仍然处于更高的优先级。无论如何......

god 以及其他几个进程管理工具，可以轻松杀死一个进程，如果它是以多种方式行为不当..配置看起来像这样 - 您以特定的时间间隔设置检查，然后您可以说“经过五次检查，如果 CPU 使用率始终高于 98%，则对其进行核攻击”：

  restart.condition(:cpu_usage) do |c|
    c.above = 98.percent
    c.times = 5
  end

另一个不同的做法是，您可能会看看 runit 系统中的 chpst - 它允许您优雅地设置事物的界限（但对于 CPU 限制，nice 仍然是我可以使用的工具首先）。

As another poster said, running your process with nice is the way to go, but you did mention that you want to run it at a high priority, which is odd... be aware that if you're running a process at the highest priority and it's pegged, your monitoring system might not even be able to kill it, unless your monitor is at a higher priority still. Anyway....

god, as well as several other process managment tools, can easily kill a process if it's misbehaving in any of several ways.. config looks like this - you set checks at a particular interval, and then you can say "after five checks, nuke it if it's been above 98% CPU usage consistently":

  restart.condition(:cpu_usage) do |c|
    c.above = 98.percent
    c.times = 5
  end

Another, different take that you might have a look at is chpst from the runit system - it allows you to elegantly set bounds on things (but for CPU limiting, nice is still the tool I'd reach for first).

回复收藏 0 原文

~没有更多了~