Python:并行运行多个查询并完成第一个查询
我尝试创建一个对多个站点执行查询的 Python 脚本。该脚本运行良好(我使用 urllib2),但仅适用于一个链接。对于多个站点,我一个接一个地发出多个请求,但它不是很强大。
并行运行多个查询并在查询返回特定字符串时停止其他查询的理想解决方案(我猜是线程)是什么?
我发现了这个问题,但我还没有找到如何更改它以停止剩余的线程...: Python urllib2.urlopen()速度很慢,需要更好的方法来读取多个网址
提前谢谢您!
(抱歉,如果我用英语犯了错误,我是法国人^^)
I try to create a Python script that performs queries to multiple sites. The script works well (I use urllib2) but just for one link. For multiples sites, I make multiple requests one after the other but it is not very powerful.
What is the ideal solution (the threads I guess) to run multiple queries in parallel and stop others when a query returns a specific string please ?
I found this question but I have not found how to change it to stop the remaining threads... :
Python urllib2.urlopen() is slow, need a better way to read several urls
Thank you in advance !
(sorry if I made mistakes in English, I'm French ^^)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以使用 Twisted 来同时处理多个请求。在内部,它将使用 epoll(或 iocp 或 kqueue,具体取决于平台)来有效地获取 tcp 可用性通知,这比使用线程更便宜。一旦一个请求匹配,您就取消其他请求。
这里是 Twisted http 代理教程。
You can use Twisted to deal with multiple requests concurrently. Internally it will use epoll (or iocp or kqueue depending on the platform) to get notified of tcp availability efficently, which is cheaper than using threads. Once one request matches, you cancel the others.
Here is the Twisted http agent tutorial.
通常这是用以下模式实现的(抱歉,我的Python技能不太好)。
您有一个名为 Runner 的类。这个类有一个长时间运行的方法,它可以获取您需要的信息。此外,它还有一个 Cancel 方法,该方法以某种方式中断长时间运行的方法(您可以使 url request 对象成为类成员字段,因此 cancel 类会调用 request.terminate() 的等效项)。
长时间运行的方法需要接受一个回调函数,在完成时发出信号。
然后,在启动多个线程之前,创建该类的所有这些对象的实例,并将它们保存在列表中。在同一个循环中,您可以启动这些长时间运行的方法,并传递主程序的回调方法。
并且,在回调方法中,您只需遍历所有线程类的列表并调用它们的取消方法。
请使用任何 Python 特定实现来编辑我的答案:)
Usually this is implemented with the following pattern (sorry, my Python skills are not so good).
You have a class named Runner. This class has long running method, which gets the information you need. Also, it has a Cancel method, which interrupts the long running method in some way (you can make the url request object a class member field, so the cancel class calls the equivalent of request.terminate()).
The long running method need to accept a callback function, which to signal when done.
Then, before you start your many threads, you create instances of all these objects of that class, and keep them in a list. In the same loop you can start these long running methods, passing a callback method of your main program.
And, in the callback method, you just go trough the list of all threaded classes and call their cancel method.
Please, edit my answer with any Python specific implementation :)
您可以使用
multiprocessing
库运行查询、轮询结果以及关闭不再需要的查询。该模块的文档包含有关Process 类,它有一个 terminate() 方法。如果您希望限制发送的请求数量,请查看池化选项。You can run your queries with the
multiprocessing
library, poll for results, and shutdown queries you no longer need. Documentation for the module includes information on the Process class which has a terminate() method. If you wish to limit the number of requests sent out, check out options for pooling.