在 Linux 上更快地分叉大型进程?
在现代 Linux 上,要实现与大型进程中的 fork
-execve
组合相同的效果,最快、最好的方法是什么?
我的问题是进程分叉大约 500MByte 大,并且一个简单的基准测试只能从进程中实现约 50 个分叉/秒(比较最小大小的进程的约 1600 个分叉/秒),这对于预期的应用程序来说太慢了。
一些谷歌搜索出现vfork
作为这个问题的解决方案而被发明......但也有关于 不使用它。现代Linux似乎已经获得了相关的clone
和posix_spawn
调用;这些可能有帮助吗? vfork 的现代替代品是什么?
我在 i7 上使用 64 位 Debian Lenny(如果 posix_spawn
有帮助,该项目可以转移到 Squeeze)。
What's the fastest, best way on modern Linux of achieving the same effect as a fork
-execve
combo from a large process ?
My problem is that the process forking is ~500MByte big, and a simple benchmarking test achieves only about 50 forks/s from the process (c.f ~1600 forks/s from a minimally sized process) which is too slow for the intended application.
Some googling turns up vfork
as having being invented as the solution to this problem... but also warnings about not to use it. Modern Linux seems to have acquired related clone
and posix_spawn
calls; are these likely to help ? What's the modern replacement for vfork
?
I'm using 64bit Debian Lenny on an i7 (the project could move to Squeeze if posix_spawn
would help).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
在 Linux 上,您可以将 posix_spawn(2) 与 POSIX_SPAWN_USEVFORK 标志一起使用,以避免从大型进程派生时复制页表的开销。
请参阅最小化创建应用程序子进程的内存使用,很好地总结了
posix_spawn(2)
、其优点和一些示例。要利用
vfork(2)
,请确保在#include
之前#define _GNU_SOURCE
,然后简单posix_spawnattr_setflags(&attr, POSIX_SPAWN_USEVFORK)
我可以确认这在 Debian Lenny 上有效,并且在从大型进程分叉时提供了巨大的加速。
On Linux, you can use
posix_spawn(2)
with thePOSIX_SPAWN_USEVFORK
flag to avoid the overhead of copying page tables when forking from a large process.See Minimizing Memory Usage for Creating Application Subprocesses for a good summary of
posix_spawn(2)
, its advantages and some examples.To take advantage of
vfork(2)
, make sure you#define _GNU_SOURCE
before#include <spawn.h>
and then simplyposix_spawnattr_setflags(&attr, POSIX_SPAWN_USEVFORK)
I can confirm that this works on Debian Lenny, and provides a massive speed-up when forking from a large process.
结果:我本来打算按照此处其他答案的建议走早期生成的辅助子进程路线,但后来我遇到了 此使用大页面支持来提高 fork 性能。
我自己尝试过使用 libhugetlbfs 来简单地让我的所有应用程序的 malloc 分配大页面,我现在得到了大约 2400 叉/秒无论进程大小(无论如何都超出了我感兴趣的范围)。惊人的。
Outcome: I was going to go down the early-spawned helper subprocess route as suggested by other answers here, but then I came across this re using huge page support to improve fork performance.
Having tried it myself using libhugetlbfs to simply make all my app's mallocs allocate huge pages, I'm now getting around 2400 forks/s regardless of the process size (over the range I'm interested in anyway). Amazing.
您实际上测量过分叉花费了多少时间吗?引用您链接的页面,
因此 数字 分叉并不能真正显示开销有多大。您应该测量分叉消耗的时间,并且(这是一般建议)仅由您实际执行的分叉消耗,而不是通过基准测试最大性能。
但是,如果您确实发现分叉大型进程的速度很慢,则可以生成一个小型辅助进程,将主进程通过管道传输到其输入,并从中接收命令
exec
。小进程将fork
并exec
这些命令。posix_spawn()
这个函数,据我了解,在桌面系统上是通过
fork
/exec
实现的。然而,在嵌入式系统中(特别是那些没有 MMU 的系统),进程是通过syscall,其接口是 posix_spawn 或类似的函数。引用 POSIX 标准描述posix_spawn
的信息部分< /a>:如果您的目标是最大限度地减少时间消耗,我认为您不会从桌面上的此功能中受益。
Did you actually measure how much time forks take? Quoting the page you linked,
So the number of forks doesn't really show how big the overhead will be. You should measure the time consumed by forks, and (which is a generic advice) consumed only by the forks you actually perform, not by benchmarking maximum performance.
But if you really figure out that forking a large process is a slow, you may spawn a small ancillary process, pipe master process to its input, and receive commands to
exec
from it. The small process willfork
andexec
these commands.posix_spawn()
This function, as far as I understand, is implemented via
fork
/exec
on desktop systems. However, in embedded systems (particularly, in those without MMU on board), processes are spawned via a syscall, interface to which isposix_spawn
or a similar function. Quoting the informative section of POSIX standard describingposix_spawn
:I don't think that you will benefit from this function on desktop if your goal is to minimize time consumption.
如果您提前知道子进程的数量,那么在启动时预分叉您的应用程序然后通过管道分发 execv 信息可能是合理的。或者,如果您的程序中存在某种“间歇”,那么提前分叉一两个子进程以便稍后快速周转可能是合理的。这些选项都不能直接解决问题,但如果任一方法适合您的应用程序,它可能会让您回避问题。
If you know the number of subprocess ahead of time, it might be reasonable to pre-fork your application on startup then distribute the execv information via a pipe. Alternatively, if there is some sort of "lull" in your program it might be reasonable to fork ahead of time a subprocess or two for quick turnaround at a later time. Neither of these options would directly solve the problem but if either approach is suitable to your app, it might allow you to side-step the issue.
我遇到过这篇博客文章: http://blog.famzah.net/2009/11/20/a-much-faster-popen-and-system-implementation-for-linux/
摘录:
但是我认为它可能仍然受到 setuid 问题的影响:
I've come across this blog post: http://blog.famzah.net/2009/11/20/a-much-faster-popen-and-system-implementation-for-linux/
Excerpt:
However I think it may still be subject to the setuid problem: