使用异步套接字客户端进行原始 HTTP 解析

发布于 2024-11-04 20:48:23 字数 1180 浏览 1 评论 0原文

我发现另一个问题要求相同类型的功能,但是这个问题已经有两年多了,所以我想知道从那时起是否有人见过任何东西。

我基本上已经编写了自己的 异步 http/socket 客户端使用标准.NET 套接字。我维护着一个 1024 个套接字池,并且有 128 个“服务”线程,使用套接字池从互联网下载网页,速度高达每秒 371 页(今天刚刚在单个 Amazon 的 EC2 服务器上进行了测试)。我还制作了另一个异步 HTTP 客户端,它使用 HttpWebRequest 异步下载网页,但它的速度明显慢:使用相同的设置,我的吞吐量平均约为每秒 50 页(也在 Amazon 的 EC2 上进行了测试): 1024 个池化 HttpWebRequest 和 128 个“服务”线程。

当然,提供HTTP协议支持会占用更多的处理能力和内存。我希望使用亚马逊的超大 EC2 服务器,我不会受到处理能力/内存的限制,而只会受到网络带宽的限制(到目前为止一直是这种情况)。

我正在使用的机器的一个示例是 Amazon 的 High-CPU Extra Large Instance:

  • 7 GB 内存
  • 20 个 EC2 计算单元(8 个虚拟内核,每个虚拟内核 2.5 个 EC2 计算单元)
  • 1690 GB 实例存储
  • 64 位平台
  • I/O 性能:高
  • API 名称:c1.xlarge

我可以编写自己的 HTTP 处理,该处理符合HTTP 协议,但如果有一个快速且强大的现成解决方案,它将为我节省大量的工作、痛苦和痛苦。

我至少需要以下功能:

  • 构建 HTTP HEAD/GET(也许还有 POST)请求
  • 从二进制流解析 HTTP 响应
  • 支持 cookies
  • LGP 许可证 (LGPL)

有人知道任何此类解决方案吗?

I found another question that asked for the same type of functionality, but the question is more than 2 years old so I was wondering if anybody has seen anything since then.

I've basically written my own asynchronous http/socket client using the standard .NET sockets. I maintain a pool of 1024 sockets and I have 128 "service" threads using the pool of sockets to download web pages from the internet at a rate of up to 371 pages per second (just tested it today on a single Amazon's EC2 server). I also made another asynchronous HTTP client which uses HttpWebRequest to asynchronously download web pages, but it's SIGNIFICANTLY slower: my throughput is on average about 50 pages per second (also tested on Amazon's EC2) using the same setup: 1024 pooled HttpWebRequests and 128 "service" threads.

Naturally, providing HTTP protocol support will take up some more processing power and memory. I'm hoping that with Amazon's Extra Large EC2 server I will not be restricted by the processing power/memory, but by the network bandwidth only (which has been the case so far).

An example of the the machine(s) that I'm using is Amazon's High-CPU Extra Large Instance:

  • 7 GB of memory
  • 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
  • 1690 GB of instance storage
  • 64-bit platform
  • I/O Performance: High
  • API name: c1.xlarge

I can write my own HTTP processing which complies with the HTTP protocol, but it will save me a TON of work, pain and suffering if there is an off-the-shelf solution that is fast and robust.

I need the following functionality at the very minimum:

  • Build an HTTP HEAD/GET (and maybe POST) requests
  • Parsing of HTTP Response from binary stream
  • Supports cookies
  • LGP license (LGPL)

Does anybody know of any such solutions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

回忆那么伤 2024-11-11 20:48:24

我不知道 HttpWebRequest 内部如何使用套接字。打开/关闭套接字可能会对性能造成很大影响。 WebClient 使用 keep-alive,可能会工作得更好。

编辑:我做了一些谷歌搜索,但我不会接受这个答案。 WebClient 似乎是 HttpWebRequest/Response 的包装: http://www.codeproject.com/Articles/156610/WP7-WebClient-vs-HttpWebRequest.aspx?msg=3775084

更新

既然您已经开始使用套接字,我会坚持使用它们。请随意从我的网络服务器项目中获取内容:http://webserver.codeplex.com

我的解析器:

http://webserver.codeplex.com/SourceControl/changeset/view/56552#671689

I don't know how HttpWebRequest works with sockets internally. Open/Closing sockets might be a big performance hit. WebClient uses keep-alive and might work better.

Edit: I did a bit of googling and I wouldn't accept this as an answer. WebClient seems to be a wrapper around HttpWebRequest/Response: http://www.codeproject.com/Articles/156610/WP7-WebClient-vs-HttpWebRequest.aspx?msg=3775084

Update

Since you have started with sockets, I would stick with them. Feel free to take stuff from my webserver project: http://webserver.codeplex.com

My parser:

http://webserver.codeplex.com/SourceControl/changeset/view/56552#671689

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文