Linux 进程状态

发布于 2024-08-05 23:31:37 字数 70 浏览 3 评论 0原文

在 Linux 中,当进程需要从磁盘读取块时,进程的状态会发生什么变化?是不是被屏蔽了?如果是这样,如何选择另一个进程来执行?

In Linux, what happens to the state of a process when it needs to read blocks from a disk? Is it blocked? If so, how is another process chosen to execute?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

用心笑 2024-08-12 23:31:37

当一个进程需要从磁盘获取数据时,它会有效地停止在 CPU 上运行以让其他进程运行,因为该操作可能需要很长时间才能完成 - 磁盘的寻道时间至少为 5 毫秒是常见的,而 5 毫秒就是 1000 万次CPU周期,从程序的角度来看是永恒的!

从程序员的角度来看(也称为“在用户空间中”),这称为阻塞系统调用。如果您调用 write(2)(这是一个围绕同名系统调用的瘦 libc 包装器),您的进程不会完全停止在该边界处;它继续在内核中运行系统调用代码。大多数时候,它会一直到达特定的磁盘控制器驱动程序(文件名→文件系统/VFS→块设备→设备驱动程序),其中将获取磁盘上块的命令提交给适当的硬件,这是一个非常困难的过程。大多数时候运行速度很快。

然后进程进入睡眠状态(在内核空间中,阻塞称为睡眠——从内核的角度来看,没有任何东西被“阻塞”)。一旦硬件最终获取到正确的数据,它将被唤醒,然后该进程将被标记为可运行并被调度。最终,调度程序将运行该进程。

最后,在用户空间中,阻塞系统调用返回正确的状态和数据,程序流程继续进行。

可以在非阻塞模式下调用大多数 I/O 系统调用(请参阅 open(2)fcntl 中的 O_NONBLOCK (2))。在这种情况下,系统调用立即返回,仅报告提交磁盘操作。程序员稍后必须明确检查操作是否完成、成功或失败,并获取其结果(例如,使用select(2))。这称为异步或基于事件的编程。

这里提到 D 状态(在 Linux 状态名称中称为 TASK_UNINTERRUPTIBLE)的大多数答案都是不正确的。 D 状态是一种特殊的睡眠模式,仅在内核空间代码路径中触发,当该代码路径无法中断时(因为它太复杂而无法中断)程序),期望它只会阻塞很短的时间。我相信大多数“D状态”实际上是看不见的;它们的寿命非常短,无法通过“top”等采样工具观察到。

在某些情况下,您可能会遇到处于 D 状态的不可终止进程。 NFS 因这一点而闻名,我也曾多次遇到过它。我认为一些 VFS 代码路径和 NFS 之间存在语义冲突,VFS 代码路径假设始终到达本地磁盘并进行快速错误检测(在 SATA 上,错误超时约为几百毫秒),而 NFS 实际上从网络获取数据,更具弹性且恢复速度较慢(300 秒的 TCP 超时很常见)。阅读这篇文章,了解 Linux 2.6.25 中引入的使用 TASK_KILLABLE 状态。在这个时代之前,有一种 hack,您实际上可以通过向内核线程 rpciod 发送 SIGKILL 来向 NFS 进程客户端发送信号,但是忘记那个丑陋的技巧吧。...

When a process needs to fetch data from a disk, it effectively stops running on the CPU to let other processes run because the operation might take a long time to complete – at least 5ms seek time for a disk is common, and 5ms is 10 million CPU cycles, an eternity from the point of view of the program!

From the programmer point of view (also said "in userspace"), this is called a blocking system call. If you call write(2) (which is a thin libc wrapper around the system call of the same name), your process does not exactly stop at that boundary; it continues, in the kernel, running the system call code. Most of the time it goes all the way up to a specific disk controller driver (filename → filesystem/VFS → block device → device driver), where a command to fetch a block on disk is submitted to the proper hardware, which is a very fast operation most of the time.

THEN the process is put in sleep state (in kernel space, blocking is called sleeping – nothing is ever 'blocked' from the kernel point of view). It will be awakened once the hardware has finally fetched the proper data, then the process will be marked as runnable and will be scheduled. Eventually, the scheduler will run the process.

Finally, in userspace, the blocking system call returns with proper status and data, and the program flow goes on.

It is possible to invoke most I/O system calls in non-blocking mode (see O_NONBLOCK in open(2) and fcntl(2)). In this case, the system calls return immediately and only report submitting the disk operation. The programmer will have to explicitly check at a later time whether the operation completed, successfully or not, and fetch its result (e.g., with select(2)). This is called asynchronous or event-based programming.

Most answers here mentioning the D state (which is called TASK_UNINTERRUPTIBLE in the Linux state names) are incorrect. The D state is a special sleep mode which is only triggered in a kernel space code path, when that code path can't be interrupted (because it would be too complex to program), with the expectation that it would block only for a very short time. I believe that most "D states" are actually invisible; they are very short lived and can't be observed by sampling tools such as 'top'.

You can encounter unkillable processes in the D state in a few situations. NFS is famous for that, and I've encountered it many times. I think there's a semantic clash between some VFS code paths, which assume to always reach local disks and fast error detection (on SATA, an error timeout would be around a few 100 ms), and NFS, which actually fetches data from the network which is more resilient and has slow recovery (a TCP timeout of 300 seconds is common). Read this article for the cool solution introduced in Linux 2.6.25 with the TASK_KILLABLE state. Before this era there was a hack where you could actually send signals to NFS process clients by sending a SIGKILL to the kernel thread rpciod, but forget about that ugly trick.…

束缚m 2024-08-12 23:31:37

在等待文件描述符返回的read()write()时,进程将进入一种特殊的睡眠状态,称为“D”或“磁盘睡眠”。这是特殊的,因为在这种状态下进程不能被终止或中断。等待 ioctl() 返回的进程也将以这种方式进入睡眠状态。

一个例外是当文件(例如终端或其他字符设备)以 O_NONBLOCK 模式打开时,在假定设备(例如调制解调器)需要时间初始化时传递。但是,您在问题中指出了块设备。另外,我从未尝试过可能会阻塞以非阻塞模式打开的 fd 的 ioctl() (至少在不知情的情况下)。

如何选择另一个进程完全取决于您正在使用的调度程序,以及其他进程可能在该调度程序中修改其权重的操作。

某些用户空间程序在某些情况下会永远保持这种状态,直到重新启动为止。这些通常与其他“僵尸”归为一类,但该术语并不正确,因为它们在技术上并没有消失。

While waiting for read() or write() to/from a file descriptor return, the process will be put in a special kind of sleep, known as "D" or "Disk Sleep". This is special, because the process can not be killed or interrupted while in such a state. A process waiting for a return from ioctl() would also be put to sleep in this manner.

An exception to this is when a file (such as a terminal or other character device) is opened in O_NONBLOCK mode, passed when its assumed that a device (such as a modem) will need time to initialize. However, you indicated block devices in your question. Also, I have never tried an ioctl() that is likely to block on a fd opened in non blocking mode (at least not knowingly).

How another process is chosen depends entirely on the scheduler you are using, as well as what other processes might have done to modify their weights within that scheduler.

Some user space programs under certain circumstances have been known to remain in this state forever, until rebooted. These are typically grouped in with other "zombies", but the term would not be correct as they are not technically defunct.

野鹿林 2024-08-12 23:31:37

执行 I/O 的进程将进入 D 状态(不可中断睡眠),这会释放 CPU,直到出现硬件中断,告诉 CPU 返回执行程序。有关其他进程状态,请参阅 man ps

根据您的内核,有一个进程调度程序,它跟踪准备执行的进程的运行队列。它与调度算法一起告诉内核将哪个进程分配给哪个 CPU。有内核进程和用户进程需要考虑。每个进程都被分配一个时间片,这是允许它使用的一块 CPU 时间。一旦进程用完其所有时间片,它就会被标记为过期,并在调度算法中被赋予较低的优先级。

2.6内核中,有一个O(1)时间复杂度调度程序,因此无论运行多少个进程,它都会在恒定时间内分配CPU。但它更复杂,因为 2.6 引入了抢占,并且 CPU 负载平衡不是一个简单的算法。无论如何,它都是高效的,并且在等待 I/O 时 CPU 不会保持空闲状态。

A process performing I/O will be put in D state (uninterruptable sleep), which frees the CPU until there is a hardware interrupt which tells the CPU to return to executing the program. See man ps for the other process states.

Depending on your kernel, there is a process scheduler, which keeps track of a runqueue of processes ready to execute. It, along with a scheduling algorithm, tells the kernel which process to assign to which CPU. There are kernel processes and user processes to consider. Each process is allocated a time-slice, which is a chunk of CPU time it is allowed to use. Once the process uses all of its time-slice, it is marked as expired and given lower priority in the scheduling algorithm.

In the 2.6 kernel, there is a O(1) time complexity scheduler, so no matter how many processes you have up running, it will assign CPUs in constant time. It is more complicated though, since 2.6 introduced preemption and CPU load balancing is not an easy algorithm. In any case, it’s efficient and CPUs will not remain idle while you wait for the I/O.

喜爱纠缠 2024-08-12 23:31:37

正如其他人已经解释的那样,处于“D”状态(不间断睡眠)的进程负责 ps 进程的挂起。对我来说,这种情况在 RedHat 6.x 和自动挂载的 NFS 主目录中发生过很多次。

要列出处于 D 状态的进程,您可以使用以下命令:

cd /proc
for i in [0-9]*;do echo -n "$i :";cat $i/status |grep ^State;done|grep D

要了解进程的当前目录以及可能存在问题的挂载 NFS 磁盘,您可以使用类似于以下示例的命令(将 31134 替换为休眠进程) number):

# ls -l /proc/31134/cwd
lrwxrwxrwx 1 pippo users 0 Aug  2 16:25 /proc/31134/cwd -> /auto/pippo

我发现使用 -f(强制)开关向相关已挂载的 nfs 文件系统发出 umount 命令,能够唤醒休眠进程:

umount -f /auto/pippo

文件系统未卸载,因为它很忙,但相关进程确实被唤醒,我无需重新启动即可解决问题。

As already explained by others, processes in "D" state (uninterruptible sleep) are responsible for the hang of ps process. To me it has happened many times with RedHat 6.x and automounted NFS home directories.

To list processes in D state you can use the following commands:

cd /proc
for i in [0-9]*;do echo -n "$i :";cat $i/status |grep ^State;done|grep D

To know the current directory of the process and, may be, the mounted NFS disk that has issues you can use a command similar to the following example (replace 31134 with the sleeping process number):

# ls -l /proc/31134/cwd
lrwxrwxrwx 1 pippo users 0 Aug  2 16:25 /proc/31134/cwd -> /auto/pippo

I found that giving the umount command with the -f (force) switch, to the related mounted nfs file system, was able to wake-up the sleeping process:

umount -f /auto/pippo

the file system wasn't unmounted, because it was busy, but the related process did wake-up and I was able to solve the issue without rebooting.

沧笙踏歌 2024-08-12 23:31:37

假设您的进程是单线程,并且您正在使用阻塞 I/O,则您的进程将阻塞等待 I/O 完成。内核会根据优度、优先级、上次运行时间等选择另一个进程同时运行。如果没有其他可运行的进程,内核将不会运行任何进程;相反,它会告诉硬件机器处于空闲状态(这将导致功耗降低)。

等待 I/O 完成的进程通常显示在状态 D 中,例如 pstop

Assuming your process is a single thread, and that you're using blocking I/O, your process will block waiting for the I/O to complete. The kernel will pick another process to run in the meantime based on niceness, priority, last run time, etc. If there are no other runnable processes, the kernel won't run any; instead, it'll tell the hardware the machine is idle (which will result in lower power consumption).

Processes that are waiting for I/O to complete typically show up in state D in, e.g., ps and top.

衣神在巴黎 2024-08-12 23:31:37

是的,任务在 read() 系统调用中被阻塞。另一个准备好的任务将运行,或者如果没有其他任务准备好,则空闲任务(针对该 CPU)将运行。

正常的阻塞磁盘读取会导致任务进入“D”状态(正如其他人所指出的)。此类任务会影响平均负载,即使它们不消耗 CPU。

一些其他类型的 IO,尤其是 tty 和网络,行为并不完全相同 - 进程最终处于“S”状态,可以被中断,并且不计入平均负载。

Yes, the task gets blocked in the read() system call. Another task which is ready runs, or if no other tasks are ready, the idle task (for that CPU) runs.

A normal, blocking disc read causes the task to enter the "D" state (as others have noted). Such tasks contribute to the load average, even though they're not consuming the CPU.

Some other types of IO, especially ttys and network, do not behave quite the same - the process ends up in "S" state and can be interrupted and doesn't count against the load average.

对你的占有欲 2024-08-12 23:31:37

是的,等待 IO 的任务被阻塞,其他任务被执行。选择下一个任务是由 Linux 调度程序 完成的。

Yes, tasks waiting for IO are blocked, and other tasks get executed. Selecting the next task is done by the Linux scheduler.

幼儿园老大 2024-08-12 23:31:37

一般情况下,进程会阻塞。如果读取操作针对标记为非阻塞的文件描述符,或者如果进程使用异步 IO,则不会阻塞。此外,如果进程有其他未被阻止的线程,它们可以继续运行。

接下来运行哪个进程取决于内核中的调度程序

Generally the process will block. If the read operation is on a file descriptor marked as non-blocking or if the process is using asynchronous IO it won't block. Also if the process has other threads that aren't blocked they can continue running.

The decision as to which process runs next is up to the scheduler in the kernel.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文