当前位置：文江博客话题详情

什么是不间断进程？

发布于 2024-07-08 08:59:30 字数 195 浏览 22 评论 0原文

有时，每当我在 Linux 中编写一个程序，并且由于某种错误而崩溃时，它将成为一个不间断的进程并继续运行，直到我重新启动计算机（即使我注销）。我的问题是：

是什么导致进程变得不间断？
我该如何阻止这种情况发生？
这可能是一个愚蠢的问题，但是有没有办法在不重新启动计算机的情况下中断它？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

空城缀染半城烟沙 2024-07-15 08:59:30

不可中断进程是指恰好处于系统调用（内核函数）中且不能被信号中断的进程。

要理解这意味着什么，您需要理解可中断系统调用的概念。典型的例子是read()。这是一个可能需要很长时间（几秒）的系统调用，因为它可能涉及旋转硬盘驱动器或移动磁头。在这段时间的大部分时间里，进程将处于休眠状态，在硬件上阻塞。

当进程在系统调用中休眠时，它可以接收 Unix 异步信号（例如，SIGTERM），然后发生以下情况：

系统调用提前退出，并设置为将 -EINTR 返回到用户空间。
信号处理程序被执行。
如果进程仍在运行，它会从系统调用中获取返回值，并且可以再次进行相同的调用。

从系统调用提前返回使用户空间代码能够立即改变其行为以响应信号。例如，响应 SIGINT 或 SIGTERM 干净地终止。

另一方面，有些系统调用是不允许以这种方式中断的。如果系统调用由于某种原因停止，进程可能会无限期地保持在这种不可终止的状态。

LWN 在 7 月份发表了一篇好文章，涉及到这个主题。

回答最初的问题：

如何防止这种情况发生：找出哪个驱动程序给您带来麻烦，然后停止使用，或者成为内核黑客并修复它。
如何在不重新启动的情况下终止不可中断的进程：以某种方式使系统调用终止。通常，在不按下电源开关的情况下执行此操作的最有效方法是拉电源线。您还可以成为内核黑客并让驱动程序使用 TASK_KILLABLE，如 LWN 文章中所述。

回复收藏 0 原文

笑梦风尘 2024-07-15 08:59:30

当进程处于用户模式时，它可以随时被中断（切换到内核模式）。当内核返回到用户模式时，它会检查是否有任何待处理的信号（包括用于终止进程的信号，例如 SIGTERM 和 SIGKILL）。这意味着只有在返回用户模式时才能终止进程。

无法在内核模式下杀死进程的原因是，它可能会破坏同一台计算机中所有其他进程使用的内核结构（就像杀死线程可能会破坏同一进程中其他线程使用的数据结构一样）。

当内核需要做一些可能需要很长时间的事情时（例如，等待另一个进程写入的管道或等待硬件做某事），它会通过将自己标记为睡眠并调用调度程序切换到另一个进程来睡眠进程（如果没有非睡眠进程，它会切换到“虚拟”进程，告诉 cpu 放慢一点速度并进入循环 — 空闲循环）。

如果将信号发送到睡眠进程，则必须先将其唤醒，然后才能返回用户空间并处理待处理的信号。这里我们有两种主要睡眠类型之间的区别：

TASK_INTERRUPTIBLE，可中断睡眠。如果任务标有此标志，则该任务正在休眠，但可以通过信号唤醒。这意味着将任务标记为睡眠的代码正在等待可能的信号，并且在唤醒后将检查它并从系统调用返回。处理信号后，系统调用可能会自动重新启动（我不会详细介绍其工作原理）。
TASK_UNINTERRUPTIBLE，不间断睡眠。如果一个任务标有此标志，则除了它正在等待的任务之外，它不会被任何其他任务唤醒，要么是因为它无法轻松重新启动，要么是因为程序期望系统调用是原子的。这也可用于已知非常短的睡眠。

TASK_KILLABLE（在 ddaa 的答案链接的 LWN 文章中提到）是一个新变体。

这回答了你的第一个问题。至于你的第二个问题：你无法避免不间断的睡眠，它们是正常的事情（例如，每次进程从磁盘读取/写入磁盘时都会发生这种情况）；然而，它们应该只持续几分之一秒。如果它们持续的时间更长，通常意味着硬件问题（或设备驱动程序问题，这对于内核来说是相同的），设备驱动程序正在等待硬件执行一些永远不会发生的事情。这也可能意味着您正在使用 NFS 并且 NFS 服务器已关闭（它正在等待服务器恢复；您也可以使用“intr”选项来避免该问题）。

最后，您无法恢复的原因与内核等待返回用户模式以传递信号或终止进程的原因相同：它可能会损坏内核的数据结构（等待可中断睡眠的代码可能会收到一个错误，告诉它返回到用户空间，可以在其中终止进程；等待不间断睡眠的代码不会出现任何错误）。

When a process is on user mode, it can be interrupted at any time (switching to kernel mode). When the kernel returns to user mode, it checks if there are any signals pending (including the ones which are used to kill the process, such as SIGTERM and SIGKILL). This means a process can be killed only on return to user mode.

The reason a process cannot be killed in kernel mode is that it could potentially corrupt the kernel structures used by all the other processes in the same machine (the same way killing a thread can potentially corrupt data structures used by other threads in the same process).

When the kernel needs to do something which could take a long time (waiting on a pipe written by another process or waiting for the hardware to do something, for instance), it sleeps by marking itself as sleeping and calling the scheduler to switch to another process (if there is no non-sleeping process, it switches to a "dummy" process which tells the cpu to slow down a bit and sits in a loop — the idle loop).

If a signal is sent to a sleeping process, it has to be woken up before it will return to user space and thus process the pending signal. Here we have the difference between the two main types of sleep:

TASK_INTERRUPTIBLE, the interruptible sleep. If a task is marked with this flag, it is sleeping, but can be woken by signals. This means the code which marked the task as sleeping is expecting a possible signal, and after it wakes up will check for it and return from the system call. After the signal is handled, the system call can potentially be automatically restarted (and I won't go into details on how that works).
TASK_UNINTERRUPTIBLE, the uninterruptible sleep. If a task is marked with this flag, it is not expecting to be woken up by anything other than whatever it is waiting for, either because it cannot easily be restarted, or because programs are expecting the system call to be atomic. This can also be used for sleeps known to be very short.

TASK_KILLABLE (mentioned in the LWN article linked to by ddaa's answer) is a new variant.

This answers your first question. As to your second question: you can't avoid uninterruptible sleeps, they are a normal thing (it happens, for instance, every time a process reads/writes from/to the disk); however, they should last only a fraction of a second. If they last much longer, it usually means a hardware problem (or a device driver problem, which looks the same to the kernel), where the device driver is waiting for the hardware to do something which will never happen. It can also mean you are using NFS and the NFS server is down (it is waiting for the server to recover; you can also use the "intr" option to avoid the problem).

Finally, the reason you cannot recover is the same reason the kernel waits until return to user mode to deliver a signal or kill the process: it would potentially corrupt the kernel's data structures (code waiting on an interruptible sleep can receive an error which tells it to return to user space, where the process can be killed; code waiting on an uninterruptible sleep is not expecting any error).

回复收藏 0 原文

倒数 2024-07-15 08:59:30

不间断进程通常在页面错误后等待 I/O。

考虑一下：

线程尝试访问不在核心中的页面（要么是按需加载的可执行文件，要么是已换出的匿名内存页，要么是按需加载的 mmap() 文件，其中几乎是同一件事）
内核现在（尝试）将其加载到
该页面可用之前该进程无法继续。

进程/任务在此状态下不能被中断，因为它无法处理任何信号；如果确实如此，就会发生另一个页面错误，并且它会回到原来的位置。

当我说“进程”时，我真正的意思是“任务”，在 Linux（2.6）下它大致翻译为“线程”，它在 /proc 中可能有也可能没有单独的“线程组”条目

在某些情况下，它可能正在等待许久。一个典型的例子是可执行文件或 mmap 文件位于服务器发生故障的网络文件系统上。如果 I/O 最终成功，任务将继续。如果最终失败，任务通常会得到一个 SIGBUS 或其他东西。

回复收藏 0 原文

回忆躺在深渊里 2024-07-15 08:59:30

对于你的第三个问题：
我认为你可以通过运行来终止不可中断的进程
sudo Kill -HUP 1。
它将重新启动 init 而不结束正在运行的进程，运行它后，我的不间断进程消失了。

回复收藏 0 原文

GRAY°灰色天空 2024-07-15 08:59:30

如果您正在谈论“僵尸”进程（在 ps 输出中指定为“zombie”），那么这是进程列表中的无害记录，等待有人收集其返回代码，并且可以安全地忽略它。

您能描述一下什么是“不间断的过程”吗？它能在“kill -9”中幸存下来并快乐地前进吗？如果是这种情况，那么它会卡在某些系统调用上，而该系统调用会卡在某些驱动程序中，并且您会卡在这个过程中，直到重新启动（有时最好尽快重新启动）或卸载相关驱动程序（这不太可能发生）。您可以尝试使用“strace”来找出您的进程被卡住的位置并在将来避免它。

回复收藏 0 原文

~没有更多了~