OOM Killer 终止进程时返回代码
我正在使用 SUSE SLES 11 的 POWER7 系统上运行多道程序工作负载(基于 SPEC CPU2006 基准测试)。
有时,工作负载中的每个应用程序都会消耗大量内存,并且总内存占用超过系统中安装的可用内存 (32国标)。
我禁用了交换,因为否则使用交换的进程的测量可能会受到严重影响。我知道,通过这样做,内核可能会通过 OOM 杀手杀死一些进程。那完全没问题。问题是我希望被内核杀死的线程会以错误条件退出(例如,进程被信号终止)。
我有一个框架,它启动所有进程,然后使用
waitpid(pid, &status, 0);
即使线程被 OOM 杀手杀死(我知道因为我在屏幕和 /var/log/messages 中收到消息)等待它们,调用也会
WIFEXITED(status);
返回1,并且调用
WEXITSTATUS(status);
返回零。因此,我无法区分进程何时正确完成以及何时被 OOM 杀手杀死。
我做错了什么吗?您知道有什么方法可以检测进程何时被 OOM 杀手终止吗?
我发现这篇文章提出了几乎相同的问题。但是,由于这是一个旧帖子并且答案并不令人满意,因此我决定发布一个新问题。
I am running a multiprogrammed workload (based on SPEC CPU2006 benchmarks) on a POWER7 system using SUSE SLES 11.
Sometimes, each application in the workload consumes a significant amount of memory and the total memory footprint exceeds the available memory installed in the system (32 GB).
I disabled the swap since otherwise the measurements could be heavily affected for the processes using the swap. I know that by doing that the kernel, through the OOM killer, may kill some of the processes. That is totally fine. The problem is that I would expect that a thread killed by the kernel exited with an error condition (e.g., the process was terminated by a signal).
I have a framework that launches all the processes and then waits for them using
waitpid(pid, &status, 0);
Even if a thread is killed by the OOM killer (I know that since I get a message in the screen and in /var/log/messages), the call
WIFEXITED(status);
returns one, and the call
WEXITSTATUS(status);
returns zero. Therefore, I am not able to distinguish when a process finishes correctly and when it is killed by the OOM killer.
Am I doing anything wrong? Do you know any way to detect when a process has been killed by the OOM killer.
I found this post asking pretty much the same question. However, since it is an old post and answers were not satisfactory, I decided to post a new question.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Linux OOM 杀手通过发送
SIGKILL
来工作。如果您的进程被 OOM 终止,那么WIFEXITED
返回 1 就很可疑。TLPI
所以你应该能够使用以下方法来测试它:
The Linux OOM killer works by sending
SIGKILL
. If your process is killed by the OOM it's fishy thatWIFEXITED
returns 1.TLPI
So you should be able to test this using: