OOM Killer 终止进程时返回代码

发布于 2024-12-01 12:13:49 字数 742 浏览 7 评论 0原文

我正在使用 SUSE SLES 11 的 POWER7 系统上运行多道程序工作负载(基于 SPEC CPU2006 基准测试)。

有时,工作负载中的每个应用程序都会消耗大量内存,并且总内存占用超过系统中安装的可用内存 (32国标)。

我禁用了交换,因为否则使用交换的进程的测量可能会受到严重影响。我知道,通过这样做,内核可能会通过 OOM 杀手杀死一些进程。那完全没问题。问题是我希望被内核杀死的线程会以错误条件退出(例如,进程被信号终止)。

我有一个框架,它启动所有进程,然后使用

waitpid(pid, &status, 0);

即使线程被 OOM 杀手杀死(我知道因为我在屏幕和 /var/log/messages 中收到消息)等待它们,调用也会

WIFEXITED(status);

返回1,并且调用

WEXITSTATUS(status);

返回零。因此,我无法区分进程何时正确完成以及何时被 OOM 杀手杀死。

我做错了什么吗?您知道有什么方法可以检测进程何时被 OOM 杀手终止吗?

我发现这篇文章提出了几乎相同的问题。但是,由于这是一个旧帖子并且答案并不令人满意,因此我决定发布一个新问题。

I am running a multiprogrammed workload (based on SPEC CPU2006 benchmarks) on a POWER7 system using SUSE SLES 11.

Sometimes, each application in the workload consumes a significant amount of memory and the total memory footprint exceeds the available memory installed in the system (32 GB).

I disabled the swap since otherwise the measurements could be heavily affected for the processes using the swap. I know that by doing that the kernel, through the OOM killer, may kill some of the processes. That is totally fine. The problem is that I would expect that a thread killed by the kernel exited with an error condition (e.g., the process was terminated by a signal).

I have a framework that launches all the processes and then waits for them using

waitpid(pid, &status, 0);

Even if a thread is killed by the OOM killer (I know that since I get a message in the screen and in /var/log/messages), the call

WIFEXITED(status);

returns one, and the call

WEXITSTATUS(status);

returns zero. Therefore, I am not able to distinguish when a process finishes correctly and when it is killed by the OOM killer.

Am I doing anything wrong? Do you know any way to detect when a process has been killed by the OOM killer.

I found this post asking pretty much the same question. However, since it is an old post and answers were not satisfactory, I decided to post a new question.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

情话已封尘 2024-12-08 12:13:49

Linux OOM 杀手通过发送 SIGKILL 来工作。如果您的进程被 OOM 终止,那么 WIFEXITED 返回 1 就很可疑。

TLPI

为了终止选定的进程,OOM 杀手会发出 SIGKILL
信号。

所以你应该能够使用以下方法来测试它:

if (WIFSIGNALED(status)) {
    if (WTERMSIG(status) == SIGKILL)
        printf("Killed by SIGKILL\n");
}

The Linux OOM killer works by sending SIGKILL. If your process is killed by the OOM it's fishy that WIFEXITED returns 1.

TLPI

To kill the selected process, the OOM killer delivers a SIGKILL
signal.

So you should be able to test this using:

if (WIFSIGNALED(status)) {
    if (WTERMSIG(status) == SIGKILL)
        printf("Killed by SIGKILL\n");
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文