PHP多线程,MySQL
我有一个 php 脚本,我用它赚了大约 100 万。每天都会向特定的 Web 服务发出请求。
问题在于,在“正常”工作流程中,脚本几乎要整天工作才能完成工作。 因此我开发了一个附加组件。基本上我开发了一个脚本,它使用多卷曲 GET 请求访问主脚本,为每 500 条记录生成一些随机 tempid,最后使用 POST 与所有生成的 tempid 发出另一个多卷曲请求。 但是,我认为这不是正确的方法,因此我需要一些建议/解决方案来向主脚本添加多线程功能,而无需使用其他/外部应用程序(例如我当前正在使用的curl 脚本)。 这是主要脚本: http://pastebin.com/rUQ6pwGS
I have a php script which I use to make about 1 mil. requests every day to a specific web service.
The problem is that in a "normal" workflow the script is working almost the whole day to complete the job .
Therefore I've worked on an additional component. Basically I developed a script which access the main script using multi-curl GET request to generates some random tempid for each 500 records and finally makes another multi-curl request using POST with all the generated tempids.
However I don't feel this is the right way so I would like some advice/solutions to add multithreading capabilities to the main script without to use additional /external applications (e.g the curl script that I'm currently using).
Here is the main script : http://pastebin.com/rUQ6pwGS
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果你想做正确的事情,你应该安装一个消息队列。我更喜欢 redis 因为它是一个“数据结构服务器,因为键可以包含字符串、散列、列表、集合和排序套”。此外,redis 非常快。
使用 blpop(使用
php
生成几个工作线程并发处理工作)监听新消息(work)并 rpush 将新消息推送到队列中。 生成进程的成本较高(相对),并且在使用消息队列时,只需在创建进程时执行一次。如果可以的话,我会选择 phpredis (需要重新编译 PHP),因为它是用 编写的扩展C 语言因此将比纯 PHP 客户端快得多。另外 PRedis 也是非常成熟的库,您可以使用。
您还可以使用 brpop/rpush 作为某种锁(如果需要)。这是因为:
我建议您看一下 Simon 的 redis 教程 以了解一下Redis 提供的强大功能。
If you want to do it right you should install a message queue. My preference goes out to redis because it is a "data structure server since keys can contain strings, hashes, lists, sets and sorted sets". Also redis is extremely fast.
Using the blpop(spawning a couple of worker threads using
php <yourscript>
to process work concurrently) to listen for new messages(work) and rpush to push new messages onto the queue. Spawning processes is expensive(relative) and when using a message queue this has to be done only once when the process is created.I would go for phpredis if you could(need to be to recompile PHP) because it is an extension written in C and therefor going to be a lot faster than the pure PHP clients. Else PRedis is also pretty mature library you could use.
You could also use this brpop/rpush as some sort of lock(if you need to). This is because:
I would advise you to have a look at Simon's redis tutorial to get an impression of the sheer power that redis has to offer.
这是后台进程,对吗?在这种情况下,您不应通过 Web 服务器运行它。从命令行运行它,作为守护程序或作为 cron 作业。
我更喜欢“cron”作业,因为你可以免费自动重启。确保运行的程序实例没有超出所需数量(您可以通过锁定文件系统中的文件、在数据库中执行原子操作等来实现此目的)。
然后,您只需启动所需数量的进程,并让它们从队列中读取工作。
通常执行此操作的模式是使用一个包含列的表来存储当前正在执行给定任务的人员:
然后该过程将执行以下伪查询来锁定一组任务(batch_size 是每批有多少个任务,可以是 1)
然后使用 select 来选择返回的行以查找当前流程的任务。然后处理任务,并将其更新为“已完成”并清除锁定。
我会选择带有控制器进程的 cron 作业,该进程启动 N 个子进程并监视它们。子进程可能会定期死亡(请记住,PHP 没有良好的 GC,因此它很容易泄漏内存)并重新生成以防止资源泄漏。
如果工作全部完成,父级可以退出,并等待 cron 重新生成(下一小时或其他时间)。
注意:locked_by_host 可以存储主机名(pid 在不同主机中不是唯一的)以允许分布式处理,但也许您不需要它,因此您可以省略它。
您可以通过放置一个locked_time列并检测任务何时花费太长时间来使此设计更加健壮 - 您可以发出警报,终止进程,然后重试或执行其他操作。
This is background process, correct? In which case, you should not run it via a web server. Run it from the command-line, either as a daemon or as a cron job.
My preference is a "cron" job because you get automatic restart for free. Be sure that you don't have more instances of the program running than desired (You can achieve this by locking a file in the filesystem, doing something atomic in a database etc).
Then you just need to start the number of processes you want, and have them read work from a queue.
Normally the pattern for doing this is having a table containing columns to store who is currently excuting a given task:
Then the process will do the following pseduo-query to lock a set of tasks (batch_size is how many per batch, can be 1)
Then select the rows back out using a select to find the current process's tasks. Then process the tasks, and update them as being "done" and clear out the lock.
I'd opt for a cron job with a controller process which starts up N child processes and monitors them. The child processes could periodically die (remember PHP does not have good GC, so it can easily leak memory) and be respawned to prevent resource leaks.
If the work is all done, the parent could quit, and wait to be respawned by cron (the next hour or something).
NB: locked_by_host can store the host name (pids aren't unique in different hosts) to allow for distributed processing, but maybe you don't need that, so you can omit it.
You can make this design more robust by putting a locked_time column and detecting when a task has been taking too long - you can alert, kill the process, and try again or something.