在 Linux 上更快地分叉大型进程?

发布于 2024-08-30 06:27:49 字数 642 浏览 2 评论 0原文

在现代 Linux 上,要实现与大型进程中的 fork-execve 组合相同的效果,最快、最好的方法是什么?

我的问题是进程分叉大约 500MByte 大,并且一个简单的基准测试只能从进程中实现约 50 个分叉/秒(比较最小大小的进程的约 1600 个分叉/秒),这对于预期的应用程序来说太慢了。

一些谷歌搜索出现vfork 作为这个问题的解决方案而被发明......但也有关于 不使用它。现代Linux似乎已经获得了相关的cloneposix_spawn调用;这些可能有帮助吗? vfork 的现代替代品是什么?

我在 i7 上使用 64 位 Debian Lenny(如果 posix_spawn 有帮助,该项目可以转移到 Squeeze)。

What's the fastest, best way on modern Linux of achieving the same effect as a fork-execve combo from a large process ?

My problem is that the process forking is ~500MByte big, and a simple benchmarking test achieves only about 50 forks/s from the process (c.f ~1600 forks/s from a minimally sized process) which is too slow for the intended application.

Some googling turns up vfork as having being invented as the solution to this problem... but also warnings about not to use it. Modern Linux seems to have acquired related clone and posix_spawn calls; are these likely to help ? What's the modern replacement for vfork ?

I'm using 64bit Debian Lenny on an i7 (the project could move to Squeeze if posix_spawn would help).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

纸短情长 2024-09-06 06:27:49

在 Linux 上,您可以将 posix_spawn(2) 与 POSIX_SPAWN_USEVFORK 标志一起使用,以避免从大型进程派生时复制页表的开销。

请参阅最小化创建应用程序子进程的内存使用,很好地总结了 posix_spawn(2)、其优点和一些示例。

要利用 vfork(2),请确保在 #include 之前#define _GNU_SOURCE,然后简单posix_spawnattr_setflags(&attr, POSIX_SPAWN_USEVFORK)

我可以确认这在 Debian Lenny 上有效,并且在从大型进程分叉时提供了巨大的加速。

benchmarking the various spawns over 1000 runs at 100M RSS
                            user     system      total        real
fspawn (fork/exec):     0.100000  15.460000  40.570000 ( 41.366389)
pspawn (posix_spawn):   0.010000   0.010000   0.540000 (  0.970577)

On Linux, you can use posix_spawn(2) with the POSIX_SPAWN_USEVFORK flag to avoid the overhead of copying page tables when forking from a large process.

See Minimizing Memory Usage for Creating Application Subprocesses for a good summary of posix_spawn(2), its advantages and some examples.

To take advantage of vfork(2), make sure you #define _GNU_SOURCE before #include <spawn.h> and then simply posix_spawnattr_setflags(&attr, POSIX_SPAWN_USEVFORK)

I can confirm that this works on Debian Lenny, and provides a massive speed-up when forking from a large process.

benchmarking the various spawns over 1000 runs at 100M RSS
                            user     system      total        real
fspawn (fork/exec):     0.100000  15.460000  40.570000 ( 41.366389)
pspawn (posix_spawn):   0.010000   0.010000   0.540000 (  0.970577)
您的好友蓝忘机已上羡 2024-09-06 06:27:49

结果:我本来打算按照此处其他答案的建议走早期生成的辅助子进程路线,但后来我遇到了 使用大页面支持来提高 fork 性能。

我自己尝试过使用 libhugetlbfs 来简单地让我的所有应用程序的 malloc 分配大页面,我现在得到了大约 2400 叉/秒无论进程大小(无论如何都超出了我感兴趣的范围)。惊人的。

Outcome: I was going to go down the early-spawned helper subprocess route as suggested by other answers here, but then I came across this re using huge page support to improve fork performance.

Having tried it myself using libhugetlbfs to simply make all my app's mallocs allocate huge pages, I'm now getting around 2400 forks/s regardless of the process size (over the range I'm interested in anyway). Amazing.

靖瑶 2024-09-06 06:27:49

您实际上测量过分叉花费了多少时间吗?引用您链接的页面

Linux从来没有这个问题;因为Linux内部使用了写时复制语义,所以Linux只在页面发生变化时才复制页面(实际上,仍然有一些表需要复制;在大多数情况下,它们的开销并不大)

因此 数字 分叉并不能真正显示开销有多大。您应该测量分叉消耗的时间,并且(这是一般建议)仅由您实际执行的分叉消耗,而不是通过基准测试最大性能。

但是,如果您确实发现分叉大型进程的速度很慢,则可以生成一个小型辅助进程,将主进程通过管道传输到其输入,并从中接收命令exec。小进程将forkexec这些命令。

posix_spawn()

这个函数,据我了解,在桌面系统上是通过 fork/exec 实现的。然而,在嵌入式系统中(特别是那些没有 MMU 的系统),进程是通过syscall,其接口是 posix_spawn 或类似的函数。引用 POSIX 标准描述 posix_spawn 的信息部分< /a>:

  • 对于实时环境来说,交换通常太慢。

  • 动态地址转换并非在 POSIX 可能有用的地方都可用。

  • 当进程必须在没有地址转换或其他 MMU 服务的情况下运行时,进程太有用了,不能简单地选择退出 POSIX。

因此,POSIX 需要无需地址转换或其他 MMU 服务即可有效实现的进程创建和文件执行原语。

如果您的目标是最大限度地减少时间消耗,我认为您不会从桌面上的此功能中受益。

Did you actually measure how much time forks take? Quoting the page you linked,

Linux never had this problem; because Linux used copy-on-write semantics internally, Linux only copies pages when they changed (actually, there are still some tables that have to be copied; in most circumstances their overhead is not significant)

So the number of forks doesn't really show how big the overhead will be. You should measure the time consumed by forks, and (which is a generic advice) consumed only by the forks you actually perform, not by benchmarking maximum performance.

But if you really figure out that forking a large process is a slow, you may spawn a small ancillary process, pipe master process to its input, and receive commands to exec from it. The small process will fork and exec these commands.

posix_spawn()

This function, as far as I understand, is implemented via fork/exec on desktop systems. However, in embedded systems (particularly, in those without MMU on board), processes are spawned via a syscall, interface to which is posix_spawn or a similar function. Quoting the informative section of POSIX standard describing posix_spawn:

  • Swapping is generally too slow for a realtime environment.

  • Dynamic address translation is not available everywhere that POSIX might be useful.

  • Processes are too useful to simply option out of POSIX whenever it must run without address translation or other MMU services.

Thus, POSIX needs process creation and file execution primitives that can be efficiently implemented without address translation or other MMU services.

I don't think that you will benefit from this function on desktop if your goal is to minimize time consumption.

我不是你的备胎 2024-09-06 06:27:49

如果您提前知道子进程的数量,那么在启动时预分叉您的应用程序然后通过管道分发 execv 信息可能是合理的。或者,如果您的程序中存在某种“间歇”,那么提前分叉一两个子进程以便稍后快速周转可能是合理的。这些选项都不能直接解决问题,但如果任一方法适合您的应用程序,它可能会让您回避问题。

If you know the number of subprocess ahead of time, it might be reasonable to pre-fork your application on startup then distribute the execv information via a pipe. Alternatively, if there is some sort of "lull" in your program it might be reasonable to fork ahead of time a subprocess or two for quick turnaround at a later time. Neither of these options would directly solve the problem but if either approach is suitable to your app, it might allow you to side-step the issue.

可可 2024-09-06 06:27:49

我遇到过这篇博客文章: http://blog.famzah.net/2009/11/20/a-much-faster-popen-and-system-implementation-for-linux/

pid = clone(fn, stack_aligned, CLONE_VM | SIGCHLD, arg);

摘录:

系统调用clone()来救援。使用clone()我们创建一个
子进程具有以下特点:

  • 子进程与父进程在同一内存空间中运行。这意味着子进程运行时不会复制任何内存结构。
    创建的。因此,对任何非堆栈变量的任何更改
    子进程创建的内容对父进程可见。这类似于
    线程,因此与 fork() 完全不同,而且也非常不同
    危险——我们不希望孩子惹恼父母。
  • 子级从一个入口函数开始,该函数在子级创建后立即被调用。这类似于线程,但与 fork() 不同。
  • 子进程有一个独立的堆栈空间,类似于线程和fork(),但与vfork()完全不同。
  • 最重要的是:这个类似线程的子进程可以调用exec()。

简而言之,通过以下方式调用clone,我们创建了一个
子进程与线程非常相似,但仍然可以调用
执行():

但是我认为它可能仍然受到 setuid 问题的影响:

http://ewontfix.com/7/“setuid 和 vfork”

现在我们要面对最糟糕的情况了。线程和 vfork 可以让你进入一个
两个进程都共享内存空间的情况
同时运行。现在,如果另一个线程在
父级调用 setuid (或任何其他影响权限的函数)?你
最终有两个具有不同权限级别的进程运行在一个
共享地址空间。这是一件坏事。

例如,考虑一个多线程服务器守护进程,最初运行
作为 root,使用 posix_spawn,通过 vfork 简单地实现,
运行外部命令。它不关心该命令是否以 root 身份运行
或者具有低权限,因为它是一个固定的命令行
保护环境,不能做任何有害的事情。 (作为一个愚蠢的例子,让我们
说它作为外部命令运行日期,因为程序员
不知道如何使用 strftime。)

因为它不关心,所以它在另一个线程中调用setuid,而不需要任何
与运行外部程序同步,其目的是
下拉到普通用户并执行用户提供的代码(也许
脚本或 dlopen 获得的模块)作为该用户。不幸的是,它
只是授予该用户在之上 mmap 新代码的权限
运行 posix_spawn 代码,或者更改字符串 posix_spawn 是
传递给子进程中的 exec。哎呀。

I've come across this blog post: http://blog.famzah.net/2009/11/20/a-much-faster-popen-and-system-implementation-for-linux/

pid = clone(fn, stack_aligned, CLONE_VM | SIGCHLD, arg);

Excerpt:

The system call clone() comes to the rescue. Using clone() we create a
child process which has the following features:

  • The child runs in the same memory space as the parent. This means that no memory structures are copied when the child process is
    created. As a result of this, any change to any non-stack variable
    made by the child is visible by the parent process. This is similar to
    threads, and therefore completely different from fork(), and also very
    dangerous – we don’t want the child to mess up the parent.
  • The child starts from an entry function which is being called right after the child was created. This is like threads, and unlike fork().
  • The child has a separate stack space which is similar to threads and fork(), but entirely different to vfork().
  • The most important: This thread-like child process can call exec().

In a nutshell, by calling clone in the following way, we create a
child process which is very similar to a thread but still can call
exec():

However I think it may still be subject to the setuid problem:

http://ewontfix.com/7/ "setuid and vfork"

Now we get to the worst of it. Threads and vfork allow you to get in a
situation where two processes are both sharing memory space and
running at the same time. Now, what happens if another thread in the
parent calls setuid (or any other privilege-affecting function)? You
end up with two processes with different privilege levels running in a
shared address space. And this is A Bad Thing.

Consider for example a multi-threaded server daemon, running initially
as root, that’s using posix_spawn, implemented naively with vfork, to
run an external command. It doesn’t care if this command runs as root
or with low privileges, since it’s a fixed command line with fixed
environment and can’t do anything harmful. (As a stupid example, let’s
say it’s running date as an external command because the programmer
couldn’t figure out how to use strftime.)

Since it doesn’t care, it calls setuid in another thread without any
synchronization against running the external program, with the intent
to drop down to a normal user and execute user-provided code (perhaps
a script or dlopen-obtained module) as that user. Unfortunately, it
just gave that user permission to mmap new code over top of the
running posix_spawn code, or to change the strings posix_spawn is
passing to exec in the child. Whoops.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文