在 PHP 中的子进程之间共享变量?

发布于 2024-12-23 15:06:48 字数 7428 浏览 3 评论 0原文

我确信我正在尝试的事情非常简单,但我以前从未完全使用过多线程,所以我不知道从哪里开始。

我正在使用 PCNTL 创建多线程 PHP 应用程序。我想要做的是让 3 个函数同时运行,并且我希望将它们的返回值合并到一个数组中。因此,从逻辑上讲,我需要在所有子级之间共享一些变量并将结果附加到其中,或者仅在单个子级和父级之间共享三个变量 - 然后父级可以稍后合并结果。

问题是 - 我不知道该怎么做。首先想到的是使用 共享内存,但我觉得就像应该有一个更简单的方法。

另外,如果它有任何效果,则分叉进程的函数是公共类方法。所以我的代码如下所示:

<?php
    class multithreaded_search {
        /* ... */
        /* Constructors and such */
        /* ... */
        public function search( $string = '' ) {
            $search_types = array( 'tag', 'substring', 'levenshtein' );
            $pids = array();
            foreach( $search_types as $type ) {
                $pid = pcntl_fork();
                $pids[$pid] = $type;
                if( $pid == 0 ) { // child process
                    /* confusion */
                    $results = call_user_func( 'multithreaded_search::'.$type.'_search', $string );
                    /* What do we do with $results ? */
                }
            }
            for( $i = 0; $i < count( $pids ); $i++ ) {
                $pid = pcntl_wait();
                /* $pids[$pid] tells me the type of search that just finished */
                /* If we need to merge results in the parent, we can do it here */
            }
            /* Now all children have exited, so the search is complete */
            return $results;
        }
        private function tag_search( $string ) {
            /* perform one type of search */
            return $results;
        }
        private function substring_search( $string ) {
            /* perform one type of search */
            return $results;
        }
        private function levenshtein_search( $string ) {
            /* perform one type of search */
            return $results;
        }
    }
?>

那么,在调用 pcntl_fork 创建共享内存并将结果保存在那里之前,我是否需要使用 shmop_open ,或者让子级共享类变量?或者它们只共享全局变量?我确信答案很简单......我只是不知道。

答案(对于任何发现这个的人)

我还有几年的经验,所以我会尝试传授一些知识。

首先,在应用程序中实现多处理时需要了解两个重要区别:

  • 线程进程分叉进程相比
  • 共享内存消息传递

线程、进程、分叉进程

  • 线程:线程的开销非常低,因为它们与父进程在同一进程空间中运行,并共享父进程的内存地址。这意味着创建或销毁线程的操作系统调用更少。如果您打算经常创建和销毁线程,那么线程是“便宜”的替代方案。 PHP 本身不支持线程。然而,从 PHP 7.2 开始,有一些 PHP 扩展(用 C 语言编写)提供了线程功能。例如:pthreads
  • 进程 :进程的开销要大得多,因为操作系统必须为其分配内存,并且对于像 PHP 这样的解释性语言,通常必须在执行您自己的代码之前加载和处理整个运行时。 PHP 确实通过 exec (同步)或 < a href="https://secure.php.net/manual/en/function.proc-open.php" rel="noreferrer">proc_open(异步)
  • 分叉进程:分叉进程将两者之间的差异分开这两种方法。一个单独的进程在当前进程的内存空间中运行。通过 PCNTL 也有对此的本地支持

。工作经常是问一个问题:“你多久会启动额外的线程/进程”?如果不是那么频繁(也许您每小时运行一个批处理作业并且该作业可以并行化),那么进程可能是更简单的解决方案。如果进入服务器的每个请求都需要某种形式的并行计算,并且您每秒收到 100 个请求,那么线程可能是最佳选择。

共享内存,消息传递

  • 共享内存:这是指允许多个线程或进程写入 RAM 的同一部分。这样做的好处是非常快速且易于理解——就像办公空间中的共享白板。任何人都可以读取或写入它。然而,在管理并发性方面它有几个缺点。想象一下,如果两个进程在同一时间写入内存中完全相同的位置,那么第三个进程会尝试读取结果。它会看到什么结果? PHP 通过 shmop 对共享内存提供本机支持,但要正确使用它需要锁、信号量、监视器或其他复杂的系统工程流程
  • 消息传递:这是“热门新事物”™,实际上自 70 年代以来就已经存在。这个想法是,您不写入共享内存,而是写入自己的内存空间,然后告诉其他线程/进程“嘿,我有一条消息给您”。 Go 编程语言有一句与此相关的著名座右铭:“不要通过共享内存来通信,而是通过通信来共享内存”。传递消息的方法有很多种,包括:写入文件、写入套接字、写入标准输出、写入共享内存等。

基本套接字解决方案

首先,我将尝试重新创建 2012 年的解决方案。 @MarcB 向我指出了 UNIX 套接字。此页面明确提到 fsockopen,它将套接字作为文件指针打开。它还在“另请参阅”部分中包含指向 socket_connect,它为您提供了对套接字的较低级别的控制。

当时我可能花了很长时间研究这些 socket_* 函数,直到我得到一些东西。现在我在谷歌上快速搜索了 socket_create_pair 并找到了这个有用的链接来获取你开始

我重写了上面的代码,将结果写入 UNIX 套接字,并将结果读入父线程:

<?php
/*
 * I retained the same public API as my original StackOverflow question,
 * but instead of performing actual searches I simply return static data
 */

class multithreaded_search {
    private $a, $b, $c;
    public function __construct($a, $b, $c) {
        $this->a = $a;
        $this->b = $b;
        $this->c = $c;
    }

    public function search( $string = '' ) {
        $search_types = array( 'tag', 'substring', 'levenshtein' );
        $pids = array();
        $threads = array();
        $sockets = array();
        foreach( $search_types as $type ) {
            /* Create a socket to write to later */
            $sockets[$type] = array();
            socket_create_pair(AF_UNIX, SOCK_STREAM, 0, $sockets[$type]);
            $pid = pcntl_fork();
            $pids[] = $pid;
            $threads[$pid] = $type;
            if( $pid == 0 ) { // child process
                /* no more confusion */
                $results = call_user_func( 'multithreaded_search::'.$type.'_search', $string );
                /* What do we do with $results ? Write them to a socket! */
                $data = serialize($results);
                socket_write($sockets[$type][0], str_pad($data, 1024), 1024);
                socket_close($sockets[$type][0]);
                exit();
            }
        }
        $results = [];
        for( $i = 0; $i < count( $pids ); $i++ ) {
            $pid = $pids[$i];
            $type = $threads[$pid];
            pcntl_waitpid($pid, $status);
            /* $threads[$pid] tells me the type of search that just finished */
            /* If we need to merge results in the parent, we can do it here */
            $one_result = unserialize(trim(socket_read($sockets[$type][1], 1024)));
            $results[] = $one_result;
            socket_close($sockets[$type][1]);
        }
        /* Now all children have exited, so the search is complete */
        return $results;
    }

    private function tag_search() {
        return $this->a;
    }

    private function substring_search() {
        return $this->b;
    }

    private function levenshtein_search() {
        return $this->c;
    }
}

$instance = new multithreaded_search(3, 5, 7);
var_dump($instance->search());

注释

此解决方案使用分叉进程并通过本地(内存中)套接字传递消息。根据您的用例和设置,这可能不是最佳解决方案。例如:

  • 如果您希望将处理拆分到多个单独的服务器并将结果传递回中央服务器,则 create_socket_pair 将不起作用。在这种情况下,您需要创建一个套接字,将套接字绑定到地址和端口,然后调用 socket_listen 等待子服务器的结果。此外,pcntl_fork 无法在多服务器环境中工作,因为进程空间无法在不同机器之间共享。
  • 如果您正在编写命令行应用程序并且更喜欢使用线程,那么您可以使用 pthreads 或抽象 pthreads 的第三方库
  • 如果您不喜欢挖掘杂草,只是想要简单的多处理,而不必担心实现细节,请查看像 Amp/Parallel

I'm sure what I'm trying is very simple, but I've never quite worked with multithreading before so I'm not sure where to start.

I'm using PCNTL to create a multithreaded PHP application. What I wish to do is have 3 functions running concurrently and I want their returned values merged into a single array. So logically I need either some variable shared among all children to which they append their results, or three variables shared only between a single child and the parent - then the parent can merge the results later.

Problem is - I have no idea how to do this. The first thing that comes to mind is using shared memory, but I feel like there should be an easier method.

Also, if it has any effect, the function which forks the process is a public class method. So my code looks something like the following:

<?php
    class multithreaded_search {
        /* ... */
        /* Constructors and such */
        /* ... */
        public function search( $string = '' ) {
            $search_types = array( 'tag', 'substring', 'levenshtein' );
            $pids = array();
            foreach( $search_types as $type ) {
                $pid = pcntl_fork();
                $pids[$pid] = $type;
                if( $pid == 0 ) { // child process
                    /* confusion */
                    $results = call_user_func( 'multithreaded_search::'.$type.'_search', $string );
                    /* What do we do with $results ? */
                }
            }
            for( $i = 0; $i < count( $pids ); $i++ ) {
                $pid = pcntl_wait();
                /* $pids[$pid] tells me the type of search that just finished */
                /* If we need to merge results in the parent, we can do it here */
            }
            /* Now all children have exited, so the search is complete */
            return $results;
        }
        private function tag_search( $string ) {
            /* perform one type of search */
            return $results;
        }
        private function substring_search( $string ) {
            /* perform one type of search */
            return $results;
        }
        private function levenshtein_search( $string ) {
            /* perform one type of search */
            return $results;
        }
    }
?>

So will I need to use shmop_open before I call pcntl_fork to create shared memory and save the results there, or do the children share class variables? Or do they only share global variables? I'm sure the answer is easy... I just don't know it.

Answers (for anyone who finds this)

I've got a few more years of experience, so I'll try to impart some knowledge.

First, there are two important distinctions to understand when it comes to implementing multiprocessing in your applications:

  • Threads versus processes versus forked processes
  • Shared memory versus message passing

Threads, processes, forked processes

  • Threads: Threads are very low overhead since they run in the same process space as the parent and share the parent's memory address. This means fewer OS calls in order to create or destroy a thread. Threads are the "cheap" alternative if you plan to be creating and destroying them often. PHP does not have native support for threads. However as of PHP 7.2, there are PHP extensions (written in C) that provide threaded functionality. For example: pthreads
  • Processes: Processes have a much larger overhead because the operating system must allocate memory for it, and in the case of interpreted languages like PHP, there's often a whole runtime that must be loaded and processed before your own code executes. PHP does have native support for spawning processes via exec (synchronous) or proc_open (asynchronous)
  • Forked processes: A forked process splits the difference between these two approaches. A separate process is run in the current processes's memory space. There is also native support for this via PCNTL

Choosing the proper tool for the job often is a matter of asking the question: "How often will you be spinning up additional threads/processes"? If it's not that often (maybe you run a batch job every hour and the job can be parallelized) then processes might be the easier solution. If every request that comes into your server requires some form of parallel computation and you receive 100 requests per second, then threads are likely the way to go.

Shared memory, message passing

  • Shared memory: This is when more than one thread or process is allowed to write to the same section of RAM. This has the benefit of being very fast and easy to understand - it's like a shared whiteboard in an office space. Anyone can read or write to it. However it has several drawbacks when it comes to managing concurrency. Imagine if two processes write to the exact same place in memory at the exact same time, then a third process tries to read the result. Which result will it see? PHP has native support for shared memory via shmop, but to use it correctly requires locks, semaphores, monitors, or other complex systems engineering processes
  • Message passing: This is the "hot new thing"™ that has actually been around since the 70's. The idea is that instead of writing to shared memory, you write into your own memory space and then tell the other threads / processes "hey, I have a message for you". The Go programming language has a famous motto related to this: "Don't communicate by sharing memory, share memory by communicating". There are a multitude of ways to pass messages, including: writing to a file, writing to a socket, writing to stdout, writing to shared memory, etc.

A basic socket solution

First, I'll attempt to recreate my solution from 2012. @MarcB pointed me towards UNIX sockets. This page explicitly mentions fsockopen, which opens a socket as a file pointer. It also includes in the "See Also" section a link to socket_connect, which gives you a bit lower-level control over sockets.

At the time I likely spent a long time researching these socket_* functions until I got something working. Now I did a quick google search for socket_create_pair and found this helpful link to get you started

I've rewritten the code above writing the results to UNIX sockets, and reading the results into the parent thread:

<?php
/*
 * I retained the same public API as my original StackOverflow question,
 * but instead of performing actual searches I simply return static data
 */

class multithreaded_search {
    private $a, $b, $c;
    public function __construct($a, $b, $c) {
        $this->a = $a;
        $this->b = $b;
        $this->c = $c;
    }

    public function search( $string = '' ) {
        $search_types = array( 'tag', 'substring', 'levenshtein' );
        $pids = array();
        $threads = array();
        $sockets = array();
        foreach( $search_types as $type ) {
            /* Create a socket to write to later */
            $sockets[$type] = array();
            socket_create_pair(AF_UNIX, SOCK_STREAM, 0, $sockets[$type]);
            $pid = pcntl_fork();
            $pids[] = $pid;
            $threads[$pid] = $type;
            if( $pid == 0 ) { // child process
                /* no more confusion */
                $results = call_user_func( 'multithreaded_search::'.$type.'_search', $string );
                /* What do we do with $results ? Write them to a socket! */
                $data = serialize($results);
                socket_write($sockets[$type][0], str_pad($data, 1024), 1024);
                socket_close($sockets[$type][0]);
                exit();
            }
        }
        $results = [];
        for( $i = 0; $i < count( $pids ); $i++ ) {
            $pid = $pids[$i];
            $type = $threads[$pid];
            pcntl_waitpid($pid, $status);
            /* $threads[$pid] tells me the type of search that just finished */
            /* If we need to merge results in the parent, we can do it here */
            $one_result = unserialize(trim(socket_read($sockets[$type][1], 1024)));
            $results[] = $one_result;
            socket_close($sockets[$type][1]);
        }
        /* Now all children have exited, so the search is complete */
        return $results;
    }

    private function tag_search() {
        return $this->a;
    }

    private function substring_search() {
        return $this->b;
    }

    private function levenshtein_search() {
        return $this->c;
    }
}

$instance = new multithreaded_search(3, 5, 7);
var_dump($instance->search());

Notes

This solution uses forked processes and message passing over a local (in-memory) socket. Depending on your use case and setup, this may not be the best solution. For instance:

  • If you wish to split the processing among several separate servers and pass the results back to a central server, then create_socket_pair won't work. In this case you'll need to create a socket, bind the socket to an address and port, then call socket_listen to wait for results from the child servers. Furthermore, pcntl_fork wouldn't work in a multi-server environment since a process space can't be shared among different machines
  • If you're writing a command-line application and prefer to use threads, then you can either use pthreads or a third-party library that abstracts pthreads
  • If you don't like digging through the weeds and just want simple multiprocessing without having to worry about the implementation details, looks into a library like Amp/Parallel

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

感情旳空白 2024-12-30 15:06:49

分叉的孩子一旦在任何地方写入,就会获得自己的内存空间专用副本 - 这就是“写时复制”。虽然 shmop 确实提供了对公共内存位置的访问,但实际的 PHP 变量和脚本中定义的内容不会在子级之间共享。

在一个孩子中执行 $x = 7; 不会使其他孩子中的 $x 也变成 7。每个孩子都会有自己专用的 $x,完全独立于其他孩子的副本。

forked children will gain their own dedicated copy of their memory space as soon as they write anywhere to it - this is "copy-on-write". While shmop does provide access to a common memory location, the actual PHP variables and whatnot defined in the script are NOT shared between the children.

Doing $x = 7; in one child will not make the $x in the other children also become 7. Each child will have its own dedicated $x that is completely independent of everyone else's copy.

一世旳自豪 2024-12-30 15:06:49

只要父亲和孩子知道共享内存段的键/键就可以在 pcnlt_fork 之前执行 shmop_open 。但请记住,pcnlt_fork 在子进程中返回 0,在创建子进程失败时返回 -1(检查注释 /confusion/ 附近的代码)。父进程的 $pid 中将包含刚刚创建的子进程的 PID。

在这里检查:

http://php.net/manual/es/function.pcntl -fork.php

As long as father and children know the key/keys of the shared memory segment is ok to do a shmop_open before pcnlt_fork. But remember that pcnlt_fork returns 0 in the child's process and -1 on failure to create the child (check your code near the comment /confusion/). The father will have in $pid the PID of the child process just created.

Check it here:

http://php.net/manual/es/function.pcntl-fork.php

櫻之舞 2024-12-30 15:06:49

使用这个类:
http://pastebin.com/0wnxh4gY

http://framework.zend.com/manual/1.7/en/zendx.console.process.unix.overview.html

它利用 shm 函数通过 setVariable 在多个进程之间共享变量方法...显然你应该使用它以某种 cgi 模式运行 PHP,最有可能是 php-fpm

Use this class:
http://pastebin.com/0wnxh4gY

http://framework.zend.com/manual/1.7/en/zendx.console.process.unix.overview.html

It utilizes shm functions to share variables across many processes with the setVariable method...obviously you shoud use it running PHP in some kind of cgi mode most likely php-fpm

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文