在 Python 中进行多线程/并发编程有哪些选择?
我正在编写一个简单的站点蜘蛛,我决定借此机会学习 Python 并发编程的新知识。我决定尝试其他方法,而不是使用线程和队列,但我不知道什么适合我。
我听说过 Stackless、Celery、Twisted、Tornado 等。我不想设置数据库和 Celery 的所有其他依赖项,但如果它很适合我的目的,我会这样做。
我的问题是:我的应用程序的适用性和总体实用性之间的良好平衡是什么?我已经查看了 Stackless 中的 tasklet,但我不确定 urlopen() 调用不会阻塞或者它们会并行执行,我没有在任何地方看到提到的。
有人可以给我一些有关我的选择的详细信息以及最好使用什么吗?
谢谢。
I'm writing a simple site spider and I've decided to take this opportunity to learn something new in concurrent programming in Python. Instead of using threads and a queue, I decided to try something else, but I don't know what would suit me.
I have heard about Stackless, Celery, Twisted, Tornado, and other things. I don't want to have to set up a database and the whole other dependencies of Celery, but I would if it's a good fit for my purpose.
My question is: What is a good balance between suitability for my app and usefulness in general? I have taken a look at the tasklets in Stackless but I'm not sure that the urlopen() call won't block or that they will execute in parallel, I haven't seen that mentioned anywhere.
Can someone give me a few details on my options and what would be best to use?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Tornado 是一个网络服务器,所以它对你编写蜘蛛没有多大帮助。 Twisted 更加通用(并且不可避免地复杂),适合各种网络任务(并且与多个 GUI 框架的事件循环良好集成)。事实上,曾经有一个twisted.web.spider(但它在几年前被删除,因为它没有维护——所以你必须在Twisted提供的设施之上推出自己的)。
Tornado is a web server, so it wouldn't help you much in writing a spider. Twisted is much more general (and, inevitably, complex), good for all kinds of networking tasks (and with good integration with the event loop of several GUI frameworks). Indeed, there used to be a twisted.web.spider (but it was removed years ago, since it was unmaintained -- so you'll have to roll your own on top of the facilities Twisted does provide).
我必须说 Twisted 得到了我的投票。
在 Twisted 中执行事件驱动任务相当简单。与 GTK+ 和 DBus 等其他重要系统组件的集成非常容易。
HTTP 客户端支持目前是基本的,但正在改进(>9.0.0):查看相关问题。
额外的好处是 Twisted 在 Ubuntu 默认存储库中可用;-)
I must say that Twisted gets my vote.
Performing event-drive tasks is fairly straightforward in Twisted. Integration with other important system components such as GTK+ and DBus is very easy.
The HTTP client support is basic for now but improving (>9.0.0): see related question.
The added bonus is that Twisted is available in the Ubuntu default repository ;-)
要快速查看封装尺寸,请参阅
ohloh.net/p/compare< /a> .
当然,源大小只是一个粗略的指标(我真正想要的是 nr 页文档、nr 页示例,
依赖项),但它可以提供帮助。
For a quick look at package sizes, see
ohloh.net/p/compare .
Of course source size is only a rough metric (what I'd really like is nr pages doc, nr pages examples,
dependencies), but it can help.