如何强制执行分叉子项的最大数量?

发布于 2024-07-10 09:58:18 字数 1141 浏览 12 评论 0原文

编辑:我已标记此 C,希望得到更多回复。 我更感兴趣的是理论而不是特定的语言实现。 因此,如果您是 C 编码人员,请将以下 PHP 视为伪代码,并随意回复用 C 编写的答案。

我正在尝试通过让它并行而不是串行执行其任务来加速 PHP CLI 脚本。 这些任务彼此完全独立,因此它们开始/结束的顺序并不重要。

这是原始脚本(请注意,为了清楚起见,所有这些示例都被剥离了):

<?php

$items = range(0, 100);

function do_stuff_with($item) { echo "$item\n"; }

foreach ($items as $item) {
    do_stuff_with($item);
}

我已设法使其在 < code>$items 与 pcntl_fork() 并行,如下所示:

<?php

ini_set('max_execution_time', 0); 
ini_set('max_input_time', 0); 
set_time_limit(0);

$items = range(0, 100);

function do_stuff_with($item) { echo "$item\n"; }

$pids = array();
foreach ($items as $item) {
    $pid = pcntl_fork();
    if ($pid == -1) {
        die("couldn't fork()");
    } elseif ($pid > 0) {
        // parent
        $pids[] = $pid;
    } else {
        // child
        do_stuff_with($item);
        exit(0);
    }   
}

foreach ($pids as $pid) {
    pcntl_waitpid($pid, $status);
}

现在我想扩展此功能,以便最多同时有 10 个孩子处于活动状态。 处理这个问题的最佳方法是什么? 我尝试了一些事情,但运气不佳。

EDIT: I've tagged this C in a hope to get more response. It's more the theory I'm interested in than a specific language implementation. So if you're a C coder please treat the following PHP as pseudo-code and feel free to respond with an answer written in C.

I am trying to speed up a PHP CLI script by having it execute its tasks in parallel instead of serial. The tasks are completely independent of each other so it doesn't matter which order they start/finish in.

Here's the original script (note all these examples are stripped-back for clarity):

<?php

$items = range(0, 100);

function do_stuff_with($item) { echo "$item\n"; }

foreach ($items as $item) {
    do_stuff_with($item);
}

I've managed to make it work on the $items in parallel with pcntl_fork() as shown below:

<?php

ini_set('max_execution_time', 0); 
ini_set('max_input_time', 0); 
set_time_limit(0);

$items = range(0, 100);

function do_stuff_with($item) { echo "$item\n"; }

$pids = array();
foreach ($items as $item) {
    $pid = pcntl_fork();
    if ($pid == -1) {
        die("couldn't fork()");
    } elseif ($pid > 0) {
        // parent
        $pids[] = $pid;
    } else {
        // child
        do_stuff_with($item);
        exit(0);
    }   
}

foreach ($pids as $pid) {
    pcntl_waitpid($pid, $status);
}

Now I want to extend this so there's a maximum of, say, 10 children active at once. What's the best way of handling this? I've tried a few things but haven't had much luck.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

神经大条 2024-07-17 09:58:18

没有系统调用可以获取子 pid 列表,但 ps 可以为您做到这一点。

--ppid 开关将列出您进程的所有子进程,因此您只需计算 ps 输出的行数即可。

或者,您可以维护自己的计数器,假设 ppid 对于 fork'ed 保持不变,则在 fork() 上递增并在 SIGCHLD 信号上递减。处理。

There is no syscall to get a list of child pids, but ps can do it for you.

--ppid switch will list all children for you process so you just need to count number of lines outputted by ps.

Alternatively you can maintain your own counter that you will increment on fork() and decrement on SIGCHLD signal, assuming ppid stays unchanged for fork'ed processed.

﹉夏雨初晴づ 2024-07-17 09:58:18

我能想到的最好的办法是将所有任务添加到队列中,启动所需的最大数量的线程,然后让每个线程从队列中请求一个任务,执行该任务并请求下一个任务。 当没有更多任务要做时,不要忘记让线程终止。

The best thing I can come up with is to add all the tasks to a queue, launch the maximum number of threads you want, and then have each thread requesting a task from the queue, execute the task and requesting the next one. Don't forget to have the threads terminate when there are no more tasks to do.

魂ガ小子 2024-07-17 09:58:18

分叉是一项昂贵的操作。 从表面上看,您真正想要的是多线程,而不是多处理。 不同之处在于,线程比进程轻得多,因为线程共享虚拟地址空间,但进程具有单独的虚拟地址空间。

我不是 PHP 开发人员,但快速 Google 搜索表明 PHP 本身不支持多线程,但有一些库可以完成这项工作。

不管怎样,一旦你弄清楚如何生成线程,你就应该弄清楚要生成多少个线程。 为了做到这一点,您需要知道应用程序的瓶颈是什么。 瓶颈是CPU、内存还是I/O? 您在评论中指出您是网络绑定的,而网络是 I/O 的一种类型。

如果您受 CPU 限制,那么您只能获得与 CPU 内核一样多的并行性; 如果有更多线程,您只是在浪费时间进行上下文切换。 假设您可以计算出总共要生成多少个线程,那么您应该将工作划分为多个单元,并让每个线程独立处理一个单元。

如果您受到内存限制,那么多线程将无济于事。

由于您受 I/O 限制,因此计算要生成多少个线程有点棘手。 如果所有工作项的处理时间大致相同且差异非常小,则您可以通过测量一个工作项花费的时间来估计要生成的线程数。 然而,由于网络数据包往往具有高度可变的延迟,因此这种情况不太可能发生。

一种选择是使用线程池 - 您创建一大堆线程,然后对于要处理的每个项目,您查看池中是否有空闲线程。 如果有,则让该线程执行工作,然后转到下一个项目。 否则,您将等待线程变得可用。 选择线程池的大小很重要——太大,你会浪费时间进行不必要的上下文切换。 太少了,你就太频繁地等待线程。

另一种选择是放弃多线程/多处理,而只进行异步 I/O。 既然您提到您正在使用单核处理器,那么这可能是最快的选择。 您可以使用诸如 socket_select()< /a> 测试套接字是否有可用数据。 如果是,您可以读取数据,否则您将移动到不同的套接字。 这需要做更多的簿记工作,但是当数据在另一套接字上可用时,您可以避免等待数据进入一个套接字。

如果您想避开线程和异步 I/O 并坚持使用多处理,那么如果每个项目的处理足够昂贵,那么它仍然是值得的。 然后你可以像这样进行工作划分:

$my_process_index = 0;
$pids = array();

// Fork off $max_procs processes
for($i = 0; $i < $max_procs - 1; $i++)
{
  $pid = pcntl_fork();
  if($pid == -1)
  {
    die("couldn't fork()");
  }
  elseif($pid > 0)
  {
    // parent
    $my_process_index++;
    $pids[] = $pid
  }
  else
  {
    // child
    break;
  }
}

// $my_process_index is now an integer in the range [0, $max_procs), unique among all the processes
// Each process will now process 1/$max_procs of the items
for($i = $my_process_index; $i < length($items); $i += $max_procs)
{
  do_stuff_with($items[$i]);
}

if($my_process_index != 0)
{
  exit(0);
}

Forking is an expensive operation. From the looks of it, what you really want is multithreading, not multiprocessing. The difference is that threads are much lighter weight than processes, since threads share a virtual address space but processes have separate virtual address spaces.

I'm not a PHP developer, but a quick Google search reveals that PHP does not support multithreading natively, but there are libraries to do the job.

Anyways, once you figure out how to spawn threads, you should figure out how many threads to spawn. In order to do this, you need to know what the bottleneck of your application is. Is the bottleneck CPU, memory, or I/O? You've indicated in your comments that you are network-bound, and network is a type of I/O.

If you were CPU bound, you're only going to get as much parallelism as you have CPU cores; any more threads and you're just wasting time doing context switches. Assuming you can figure out how many total threads to spawn, you should divide your work into that many units, and have each thread process one unit independently.

If you were memory bound, then multithreading would not help.

Since you're I/O bound, figuring out how many threads to spawn is a little trickier. If all work items take approximately the same time to process with very low variance, you can estimate how many threads to spawn by measuring how long one work item takes. However, since network packets tend to have highly variable latencies, this is unlikely to be the case.

One option is to use thread pools - you create a whole bunch of threads, and then for each item to process, you see if there is a free thread in the pool. If there is, you have that thread perform the work, and you move onto the next item. Otherwise, you wait for a thread to become available. Choosing the size of the thread pool is important - too big, and you're wasting time doing unnecessary context switches. Too few, and you're waiting for threads too often.

Yet another option is to abandon multithreading/multiprocessing and just do asynchronous I/O instead. Since you mentioned you're working on a single-core processor, this will probably be the fastest option. You can use functions like socket_select() to test if a socket has data available. If it does, you can read the data, otherwise you move onto a different socket. This requires doing a lot more bookkeeping, but you avoid waiting for data to come in on one socket when data is available on a different socket.

If you want to eschew threads and asynchronous I/O and stick with multiprocessing, it can still be worthwhile if the per-item processing is expensive enough. You might then do the work division like so:

$my_process_index = 0;
$pids = array();

// Fork off $max_procs processes
for($i = 0; $i < $max_procs - 1; $i++)
{
  $pid = pcntl_fork();
  if($pid == -1)
  {
    die("couldn't fork()");
  }
  elseif($pid > 0)
  {
    // parent
    $my_process_index++;
    $pids[] = $pid
  }
  else
  {
    // child
    break;
  }
}

// $my_process_index is now an integer in the range [0, $max_procs), unique among all the processes
// Each process will now process 1/$max_procs of the items
for($i = $my_process_index; $i < length($items); $i += $max_procs)
{
  do_stuff_with($items[$i]);
}

if($my_process_index != 0)
{
  exit(0);
}
街道布景 2024-07-17 09:58:18
<?php

ini_set('max_execution_time', 0); 
ini_set('max_input_time', 0); 
set_time_limit(0);

$items = range(0, 100);

function do_stuff_with($item) { echo "$item\n"; }

$pids = array();
while (count($items)){
    $item = array_pop($items);
    $pid = pcntl_fork();
    if ($pid == -1) {
        die("couldn't fork()");
    } elseif ($pid > 0) {
        // parent
        $pids[] = $pid;
    } else {
        // child
        do_stuff_with($item);
        exit(0);
    }

    while (count($pids) >= 10){ // limit
        while (($wait_pid = pcntl_waitpid(0, $status)) != -1) {
            $status = pcntl_wexitstatus($status);
            array_pop($pids);
            echo "$wait_pid $status".PHP_EOL;
            break;
        }
    }
}

while (count($pids)){
    while (($wait_pid = pcntl_waitpid(0, $status)) != -1) {
        $status = pcntl_wexitstatus($status);
        array_pop($pids);
        echo "CHILD: child $status completed $wait_pid".PHP_EOL;
        break;
    }
}
<?php

ini_set('max_execution_time', 0); 
ini_set('max_input_time', 0); 
set_time_limit(0);

$items = range(0, 100);

function do_stuff_with($item) { echo "$item\n"; }

$pids = array();
while (count($items)){
    $item = array_pop($items);
    $pid = pcntl_fork();
    if ($pid == -1) {
        die("couldn't fork()");
    } elseif ($pid > 0) {
        // parent
        $pids[] = $pid;
    } else {
        // child
        do_stuff_with($item);
        exit(0);
    }

    while (count($pids) >= 10){ // limit
        while (($wait_pid = pcntl_waitpid(0, $status)) != -1) {
            $status = pcntl_wexitstatus($status);
            array_pop($pids);
            echo "$wait_pid $status".PHP_EOL;
            break;
        }
    }
}

while (count($pids)){
    while (($wait_pid = pcntl_waitpid(0, $status)) != -1) {
        $status = pcntl_wexitstatus($status);
        array_pop($pids);
        echo "CHILD: child $status completed $wait_pid".PHP_EOL;
        break;
    }
}
阪姬 2024-07-17 09:58:18

man 2 setrlimit

这将是针对每个用户的,可能正是您想要的。

man 2 setrlimit

That's going to be per-user which may be what you want anyway.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文