php多线程问题

发布于 2024-09-16 00:57:21 字数 211 浏览 11 评论 0原文

我正在编写一个 php cron 作业,它使用curl 读取数千个提要/网页并将内容存储在数据库中。如何将线程数限制为 6 个?即,即使我需要扫描数千个提要/网页,我也只需要 6 个卷曲线程随时处于活动状态,这样我的服务器和网络就不会陷入困境。我可以在 Java 中使用 Object 的 wait、notify、notifyall 方法轻松完成此操作。我应该构建自己的信号量还是 php 提供任何内置函数?

I am writing a php cron job that reads thousands of feeds / web pages using curl and stores the content in a database. How do I restrict the number of threads to, lets say, 6? i.e., even though I need to scan thousands of feeds / web pages, I want only 6 curl threads active at any time so that my server and network don't get bogged down. I could do it easily in Java using wait, notify, notifyall methods of Object. Should I build my own semaphore or does php provide any built-in functions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

茶花眉 2024-09-23 00:57:21

首先,PHP没有线程,但它有进程控制:
http://php.net/manual/en/book.pcntl.php

我围绕这些函数构建了一个类来帮助满足我的多进程需求。

我也有类似的情况。我记录了从 cron 启动的进程及其状态。我正在从相关的 cron 作业中检查它们。

编辑(更多详细信息):

在我的项目中,我将所有关键更改记录到数据库中。如果改变满足行动标准,则可以采取行动。所以我做的事情和你不一样。然而,也有一些相似之处。

当我分叉一个新进程时,我将其 pid 输入到数据库表中。然后,下次 cron 作业启动时,它所做的部分工作是检查进程是否已正确完成,然后在数据库表中将操作标记为已完成。

您没有提供有关您的项目的很多细节。所以我就提出一个建议:

  • 数据库表保存你想要下载的资源的URL。
  • 另一个表保存正在运行的进程的 pid。
  • 每小时运行一次的 cron 作业将遍历该表并下载资源并将其存储在数据库中。然而,它首先检查 pid 表中是否有完整/死亡/正在运行的进程,并采取相应的行动。在这里,您可以将进程限制为 6 个。

根据项目的大小,这可能看起来有点过头了。然而,我想了很长一段时间,我想跟踪所有这些分叉的进程。分叉可能是有风险的事情,并且可能导致系统资源过载 - 从经验来看;)

我也有兴趣了解其他技术。

First of all, PHP doesn't have threads, but it does have process control:
http://php.net/manual/en/book.pcntl.php

I've built a class around these functions to help with my multi-process requirements.

I'm in a similar situation. I'm keeping a log of the processes that get started from cron and their state. I'm checking on them from a related cron job.

EDIT (more details):

In my project I log all the key changes to the database. Actions may then be taken if the changes meet the actions criterion. So what I'm doing is different to you. However, there are some similarities.

When I fork a new process, I enter it's pid in a DB table. Then next time the cron job kicks in, part of what it does is check to see if the processes have completed properly, and then mark the action as completed in that DB table.

You don't give many details about your project. So I will just throw out a suggestion:

  • A DB table holds the URLs of the resources you want to download.
  • Another table holds the pids of the running processes.
  • A cron job that is run every hour will go through the table and download the resource and store it in a DB. However, first it checks the pid table for complete/dead/running processes and acts accordingly. Here you can limit your processes to 6.

Depending on the size of your project, this may seem like over kill. However, I've thought about it for a long long time, and I want to keep track of all those forked processes. Forking can be risky business, and can lead to system resource overload - speaking from experience ;)

I'd be interested to hear other techniques as well.

歌枕肩 2024-09-23 00:57:21

从我的回复 PHP 使用 proc_open,这样它就不会等待它打开(运行)的脚本完成?

我使用 proc_open 时的一些代码

我遇到了 proc_close 问题(10 到 30 秒),所以我只是使用 linux 命令 Kill 终止了该进程

Curl 有时会在各种服务器(ubuntu、centos)上冻结,但不是在所有服务器上,所以我杀死任何需要超过 40 秒的“子”进程,因为通常脚本最多需要 10 秒,我会宁愿重做工作,也不要等待一分钟左右让卷曲解冻。



$options=array();
$option['sleep-after-destroy']=0;
$option['sleep-after-create']=0;
$option['age-max']=40;
$option['dir-run']=dirname(__FILE__);
$option['step-sleep']=1;
$option['workers-max']=6;
$option['destroy-forcefull']=1;

$workers=array();

function endAWorker($i,$cansleep=true) {
        global $workers;
        global $option;
        global $child_time_limit;
        if(isset($workers[$i])) {
                @doE('Ending worker [['.$i.']]'."\n");
                if($option['destroy-forcefull']==1) {
                        $x=exec('ps x | grep "php check_working_child.php '.$i.' '.$child_time_limit.'" | grep -v "grep" | grep -v "sh -c"');
                        echo 'pscomm> '.$x."\n";
                        $x=explode(' ',trim(str_replace("\t",' ',$x)));
                        //print_r($x);
                        if(is_numeric($x[0])) {
                                $c='kill -9 '.$x[0];
                                echo 'killcommand> '.$c."\n";
                                $x=exec($c);
                        }
                }
                @proc_close($workers[$i]['link']);
                unset($workers[$i]);
        }
        if($cansleep==true) {
                sleep($option['sleep-after-destroy']);
        }
}

function startAWorker($i) {
        global $workers;
        global $option;
        global $child_time_limit;

        $runcommand='php check_working_child.php '.$i.' '.$child_time_limit.' > check_working_child_logs/'.$i.'.normal.log';
        doE('Starting [['.$i.']]: '.$runcommand."\n");
        $workers[$i]=array(
                'desc' => array(
                        0 => array("pipe", "r"),
                        1 => array("pipe", "w"),
                        2 => array("file", 'check_working_child_logs/'.$i.'.error.log', "a")
                        ),
                'pipes'                 => null,
                'link'                  => null,
                'start-time'    => mktime()
                );
        $workers[$i]['link']=proc_open(
                $runcommand,
                $workers[$i]['desc'],
                $workers[$i]['pipes'],
                $option['dir-run']
                );
        sleep($option['sleep-after-create']);
}

function checkAWorker($i) {
        global $workers;
        global $option;
        $temp=proc_get_status($workers[$i]['link']);
        if($temp['running']===false) {
                doE('Worker [['.$i.']] finished'."\n");
                if(is_file('check_working_child_logs/'.$i.'.normal.log') && filesize('check_working_child_logs/'.$i.'.normal.log')>0) {
                        doE('--------'."\n");
                        echo file_get_contents('check_working_child_logs/'.$i.'.normal.log');
                        doE('-------'."\n");
                }
                endAWorker($i);
        } else {
                if($option['age-max']>0) {
                        if($workers[$i]['start-time']+$option['age-max']$v) {
                endAWorker($i,false);
        }
        @doE('Done killing workers.'."\n");
}

register_shutdown_function('endAllWorkers');

while(1) {
        $step++;
        foreach($workers as $index=>$v) {
                checkAWorker($index);
        }
        if(count($workers)==$option['workers-max']) {
        } elseif(count($workers)$option['workers-max']) {
                $wl=array_keys($workers);
                $wl=array_pop($wl);
                doE('Killing worker [['.$wl.']]');
                endAWorker($wl[0]);
        }
}

并创建一个名为“check_working_child.php”的文件来完成所有工作,第一个参数是实例编号,第二个参数是时间限制
php check_working_child.php 5 60
意味着你是第五个孩子,可以运行 60 秒。

如果上面的代码没有运行,请告诉我,我将使用 Pastebin 或其他东西发布它......

From my reply at PHP using proc_open so that it doesn't wait for the script it opens (runs) to finish?

Some of my code when i played around with proc_open

I had issues with proc_close (10 to 30 seconds) so i just killed the process using linux command kill

Curl sometimes freezez for me on various servers (ubuntu, centos) but not on all of them, so i kill any "child" processes that take over 40 seconds because normally the script would take 10 second at maximum and i'd rather redo the work than wait a minute or so for curl to un-freeze.



$options=array();
$option['sleep-after-destroy']=0;
$option['sleep-after-create']=0;
$option['age-max']=40;
$option['dir-run']=dirname(__FILE__);
$option['step-sleep']=1;
$option['workers-max']=6;
$option['destroy-forcefull']=1;

$workers=array();

function endAWorker($i,$cansleep=true) {
        global $workers;
        global $option;
        global $child_time_limit;
        if(isset($workers[$i])) {
                @doE('Ending worker [['.$i.']]'."\n");
                if($option['destroy-forcefull']==1) {
                        $x=exec('ps x | grep "php check_working_child.php '.$i.' '.$child_time_limit.'" | grep -v "grep" | grep -v "sh -c"');
                        echo 'pscomm> '.$x."\n";
                        $x=explode(' ',trim(str_replace("\t",' ',$x)));
                        //print_r($x);
                        if(is_numeric($x[0])) {
                                $c='kill -9 '.$x[0];
                                echo 'killcommand> '.$c."\n";
                                $x=exec($c);
                        }
                }
                @proc_close($workers[$i]['link']);
                unset($workers[$i]);
        }
        if($cansleep==true) {
                sleep($option['sleep-after-destroy']);
        }
}

function startAWorker($i) {
        global $workers;
        global $option;
        global $child_time_limit;

        $runcommand='php check_working_child.php '.$i.' '.$child_time_limit.' > check_working_child_logs/'.$i.'.normal.log';
        doE('Starting [['.$i.']]: '.$runcommand."\n");
        $workers[$i]=array(
                'desc' => array(
                        0 => array("pipe", "r"),
                        1 => array("pipe", "w"),
                        2 => array("file", 'check_working_child_logs/'.$i.'.error.log', "a")
                        ),
                'pipes'                 => null,
                'link'                  => null,
                'start-time'    => mktime()
                );
        $workers[$i]['link']=proc_open(
                $runcommand,
                $workers[$i]['desc'],
                $workers[$i]['pipes'],
                $option['dir-run']
                );
        sleep($option['sleep-after-create']);
}

function checkAWorker($i) {
        global $workers;
        global $option;
        $temp=proc_get_status($workers[$i]['link']);
        if($temp['running']===false) {
                doE('Worker [['.$i.']] finished'."\n");
                if(is_file('check_working_child_logs/'.$i.'.normal.log') && filesize('check_working_child_logs/'.$i.'.normal.log')>0) {
                        doE('--------'."\n");
                        echo file_get_contents('check_working_child_logs/'.$i.'.normal.log');
                        doE('-------'."\n");
                }
                endAWorker($i);
        } else {
                if($option['age-max']>0) {
                        if($workers[$i]['start-time']+$option['age-max']$v) {
                endAWorker($i,false);
        }
        @doE('Done killing workers.'."\n");
}

register_shutdown_function('endAllWorkers');

while(1) {
        $step++;
        foreach($workers as $index=>$v) {
                checkAWorker($index);
        }
        if(count($workers)==$option['workers-max']) {
        } elseif(count($workers)$option['workers-max']) {
                $wl=array_keys($workers);
                $wl=array_pop($wl);
                doE('Killing worker [['.$wl.']]');
                endAWorker($wl[0]);
        }
}

And create a file named 'check_working_child.php' to do all the work, the first parameter will be the instance number and the second the time limit
php check_working_child.php 5 60
means you are the 5th child and are allowed to run 60 seconds

If the above code does not run let me know, i will post it using pastebin or something...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文