在 Python 中发出大量 HTTP 请求
我正在尝试测试一个 Web 应用程序,其中一部分涉及发出约 10K 请求,获取返回 200 OK
的少数 <1K 请求并检查其数据。 Web 应用程序存在缺陷并且存在误报,因此每个 200 OK 都需要至少进行三次检查。
在 Python 中工作时,我尝试使用线程和 urllib 来完成此操作,但在 Linux 上,我在大约 920 个线程后出现线程错误。 (我的理论是它的 /proc/sys/kernel/threads-max 除以 30,这是非常准确的,但令人不安的是每个线程都会在操作系统中注册为 30 个线程)。无论如何,我正在为这项任务寻找一个好的解决方案。我研究过 Twisted,但似乎我仍然会受到线程的束缚。
有什么想法吗?
I'm trying to test a web application, part of this involves making ~10K requests, taking the few <1K that return 200 OK
and going through their data. The webapp is buggy and there are false positives, so each 200 OK needs to be at least triple-checked.
Working in Python I was trying to do this with threading and urllib, but on linux I get thread errors after ~920 threads. (My theory is it's /proc/sys/kernel/threads-max
divided by thirty which is eerily accurate, but it's perturbing that each thread would register as 30 threads with the os). In any case, I'm looking for a good solution for this task. I've looked into Twisted, but it seems like I will still be bound by threading.
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我正在测试 apache ab Web 服务器 TORNADO 并且在我的双核 athlon @ 2Ghz 上每秒无法超过 1000 个连接。 30%的资源占用了测试工具ab,剩余的资源用于服务器。我非常确信大部分资源都被操作系统和 IP-eth 层消耗了。
http://amix.dk/blog/post/19581
非阻塞服务器比阻塞服务器具有更好的性能,因为它们不会为每个连接生成线程。理论上它们可以单步运行。
I was testing whit apache ab web server TORNADO and was unable to go much over 1000 connections per second on my dual core athlon @ 2Ghz . 30% resources took testing tool ab and remaining was for server. I am pretty convinced that most resources are spent by OS and IP-eth layer.
http://amix.dk/blog/post/19581
non blocking servers have better performance than blocking servers since they does not spawn tread for each connection. In theory they can run in single tread.
您可以尝试使用 异步 HTTP 请求(示例代码位于文章底部)。
You could try using asynchronous HTTP requests (there's sample code at the bottom of the article).
我使用 FunkLoad 为网站批量交易编写脚本取得了巨大成功。
I've had good success with FunkLoad for scripting bulk transactions with web sites.
为此,我过去曾使用 libcurl (pycurl) 的 Python 绑定。使用多客户端功能,该功能在 C 中异步执行。速度相当快。
I have use the Python bindings for libcurl (pycurl) in the past for this. Use the multi client feature, which does it asynchronously in C. It's pretty fast.