php多线程问题
我正在编写一个 php cron 作业,它使用curl 读取数千个提要/网页并将内容存储在数据库中。如何将线程数限制为 6 个?即,即使我需要扫描数千个提要/网页,我也只需要 6 个卷曲线程随时处于活动状态,这样我的服务器和网络就不会陷入困境。我可以在 Java 中使用 Object 的 wait、notify、notifyall 方法轻松完成此操作。我应该构建自己的信号量还是 php 提供任何内置函数?
I am writing a php cron job that reads thousands of feeds / web pages using curl and stores the content in a database. How do I restrict the number of threads to, lets say, 6? i.e., even though I need to scan thousands of feeds / web pages, I want only 6 curl threads active at any time so that my server and network don't get bogged down. I could do it easily in Java using wait, notify, notifyall methods of Object. Should I build my own semaphore or does php provide any built-in functions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
首先,PHP没有线程,但它有进程控制:
http://php.net/manual/en/book.pcntl.php
我围绕这些函数构建了一个类来帮助满足我的多进程需求。
我也有类似的情况。我记录了从 cron 启动的进程及其状态。我正在从相关的 cron 作业中检查它们。
编辑(更多详细信息):
在我的项目中,我将所有关键更改记录到数据库中。如果改变满足行动标准,则可以采取行动。所以我做的事情和你不一样。然而,也有一些相似之处。
当我分叉一个新进程时,我将其 pid 输入到数据库表中。然后,下次 cron 作业启动时,它所做的部分工作是检查进程是否已正确完成,然后在数据库表中将操作标记为已完成。
您没有提供有关您的项目的很多细节。所以我就提出一个建议:
根据项目的大小,这可能看起来有点过头了。然而,我想了很长一段时间,我想跟踪所有这些分叉的进程。分叉可能是有风险的事情,并且可能导致系统资源过载 - 从经验来看;)
我也有兴趣了解其他技术。
First of all, PHP doesn't have threads, but it does have process control:
http://php.net/manual/en/book.pcntl.php
I've built a class around these functions to help with my multi-process requirements.
I'm in a similar situation. I'm keeping a log of the processes that get started from cron and their state. I'm checking on them from a related cron job.
EDIT (more details):
In my project I log all the key changes to the database. Actions may then be taken if the changes meet the actions criterion. So what I'm doing is different to you. However, there are some similarities.
When I fork a new process, I enter it's pid in a DB table. Then next time the cron job kicks in, part of what it does is check to see if the processes have completed properly, and then mark the action as completed in that DB table.
You don't give many details about your project. So I will just throw out a suggestion:
Depending on the size of your project, this may seem like over kill. However, I've thought about it for a long long time, and I want to keep track of all those forked processes. Forking can be risky business, and can lead to system resource overload - speaking from experience ;)
I'd be interested to hear other techniques as well.
从我的回复 PHP 使用 proc_open,这样它就不会等待它打开(运行)的脚本完成?
我使用 proc_open 时的一些代码
我遇到了 proc_close 问题(10 到 30 秒),所以我只是使用 linux 命令 Kill 终止了该进程
Curl 有时会在各种服务器(ubuntu、centos)上冻结,但不是在所有服务器上,所以我杀死任何需要超过 40 秒的“子”进程,因为通常脚本最多需要 10 秒,我会宁愿重做工作,也不要等待一分钟左右让卷曲解冻。
并创建一个名为“check_working_child.php”的文件来完成所有工作,第一个参数是实例编号,第二个参数是时间限制
php check_working_child.php 5 60
意味着你是第五个孩子,可以运行 60 秒。
如果上面的代码没有运行,请告诉我,我将使用 Pastebin 或其他东西发布它......
From my reply at PHP using proc_open so that it doesn't wait for the script it opens (runs) to finish?
Some of my code when i played around with proc_open
I had issues with proc_close (10 to 30 seconds) so i just killed the process using linux command kill
Curl sometimes freezez for me on various servers (ubuntu, centos) but not on all of them, so i kill any "child" processes that take over 40 seconds because normally the script would take 10 second at maximum and i'd rather redo the work than wait a minute or so for curl to un-freeze.
And create a file named 'check_working_child.php' to do all the work, the first parameter will be the instance number and the second the time limit
php check_working_child.php 5 60
means you are the 5th child and are allowed to run 60 seconds
If the above code does not run let me know, i will post it using pastebin or something...