帮助消除由于 HTTP 响应中的 content-length = 0 导致的异步接收器中的循环
某些 Web 服务器在 HTTP 响应标头中返回设置为零的内容长度。我想要一个确定性且高性能的解决方案来接收这种情况下的所有数据。
已知表现出此行为的 URL(下面是其他 URL):
http://www.washingtonpost.com/wp-dyn/content/article/2010/02/12/AR2010021204894.html?hpid=topnews
标题:
Cache-control:no-cache
Connection:close
Content-Encoding:gzip
Content-type:text/html
Server:Web Server
Transfer-encoding:chunked
我当前的解决方案不能保证由于 MaxTries 常量而获取所有数据,并且由于 Thread.Sleep() 导致速度较慢
private bool MoreDataIsAvailable()
{
int avail = _socket.Available;
if (avail == 0 &&
_contentLength != null && _contentLength == 0)
{
int tries = 0;
while (avail == 0 && tries < MaxTries)
{
Thread.Sleep(5);
_socket.Poll(1000, SelectMode.SelectRead);
avail = _socket.Available;
tries++;
if (avail > 0)
{
Console.WriteLine(_socket.Handle + " avail = " + avail + " received = " + _bytes.Length + " && tries = " + tries);
}
}
}
return avail > 0;
}
在上下文中使用:
private void ReceiveCallback(object sender, SocketAsyncEventArgs e)
{
if (ConnectionWasClosed(e) || HadSocketError(e))
{
_receiveDone.Set();
return;
}
StoreReceivedBytes(e);
if (AllBytesReceived())
{
_receiveDone.Set();
return;
}
if (MoreDataIsExpected() || MoreDataIsAvailable())
{
WaitForBytes(e);
}
else
{
_receiveDone.Set();
}
}
示例输出:
1436 avail = 3752 received = 1704 && tries = 9
1436 avail = 3752 received = 9208 && tries = 8
1436 avail = 3752 received = 12960 && tries = 9
1436 avail = 3752 received = 20464 && tries = 8
1436 avail = 3752 received = 27968 && tries = 7
1436 avail = 7504 received = 31720 && tries = 1
1436 avail = 3752 received = 39224 && tries = 6
编辑:
Nikolai 观察到带有 Transfer-encoding: chunked 标头的响应需要特殊处理但它们的终点可以被确定性地检测到。
然而,排除分块响应之外,仍然有其他 URL 最终出现在我的包罗万象的方法中,例如:
http://www.biomedcentral.com/1471-2105/6/197
标题:
Cache-control:private
Connection:close
Content-Type:text/html
P3P:policyref="/w3c/p3p.xml", CP="NOI DSP COR CURa ADMa DEVa TAIa OUR BUS PHY ONL UNI COM NAV INT DEM PRE"
Server:Microsoft-IIS/5.0
X-Powered-By:ASP.NET
http://slampp.abangadek.com/info/
标头:
Connection:close
Content-Type:text/html
Server:Apache/2.2.8 (Ubuntu) DAV/2 PHP/5.2.4-2ubuntu5.3 with Suhosin-Patch mod_ruby/1.2.6 Ruby/1.8.6(2007-09-24) mod_ssl/2.2.8 OpenSSL/0.9.8g
X-Cache:MISS from server03.abangadek.com
X-Powered-By:PHP/5.2.4-2ubuntu5.3
http://video.forbes.com/embedvideo/?format=frame&height=515&width=336&mode=render&networklink=1
headers:
Connection:close
Content-Language:en-US
Content-Type:text/html;charset=ISO-8859-1
Server:Apache-Coyote/1.1
我想知道我可以在这些响应中查找什么,就像 Transfer-encoding header 对第一个 URL 所做的那样,它提供了确定性读取整个响应的线索,以便可以避免对我的方法的调用。
Some web servers return content-length set to zero in the HTTP response headers. I'd like a deterministic and performant solution for receiving all the data in that situation.
URL known to exhibit this behavior (additional URLs below):
http://www.washingtonpost.com/wp-dyn/content/article/2010/02/12/AR2010021204894.html?hpid=topnews
headers:
Cache-control:no-cache
Connection:close
Content-Encoding:gzip
Content-type:text/html
Server:Web Server
Transfer-encoding:chunked
My current solution is not guaranteed to get all the data due to the MaxTries constant and is slow due to Thread.Sleep()
private bool MoreDataIsAvailable()
{
int avail = _socket.Available;
if (avail == 0 &&
_contentLength != null && _contentLength == 0)
{
int tries = 0;
while (avail == 0 && tries < MaxTries)
{
Thread.Sleep(5);
_socket.Poll(1000, SelectMode.SelectRead);
avail = _socket.Available;
tries++;
if (avail > 0)
{
Console.WriteLine(_socket.Handle + " avail = " + avail + " received = " + _bytes.Length + " && tries = " + tries);
}
}
}
return avail > 0;
}
Usage in context:
private void ReceiveCallback(object sender, SocketAsyncEventArgs e)
{
if (ConnectionWasClosed(e) || HadSocketError(e))
{
_receiveDone.Set();
return;
}
StoreReceivedBytes(e);
if (AllBytesReceived())
{
_receiveDone.Set();
return;
}
if (MoreDataIsExpected() || MoreDataIsAvailable())
{
WaitForBytes(e);
}
else
{
_receiveDone.Set();
}
}
Sample output:
1436 avail = 3752 received = 1704 && tries = 9
1436 avail = 3752 received = 9208 && tries = 8
1436 avail = 3752 received = 12960 && tries = 9
1436 avail = 3752 received = 20464 && tries = 8
1436 avail = 3752 received = 27968 && tries = 7
1436 avail = 7504 received = 31720 && tries = 1
1436 avail = 3752 received = 39224 && tries = 6
edit:
Nikolai observed that responses with a Transfer-encoding: chunked header need special handling but their ends can be detected deterministically.
Excluding the chunked responses, however, there are still other URLs that end up in my catch-all method, examples:
http://www.biomedcentral.com/1471-2105/6/197
headers:
Cache-control:private
Connection:close
Content-Type:text/html
P3P:policyref="/w3c/p3p.xml", CP="NOI DSP COR CURa ADMa DEVa TAIa OUR BUS PHY ONL UNI COM NAV INT DEM PRE"
Server:Microsoft-IIS/5.0
X-Powered-By:ASP.NET
http://slampp.abangadek.com/info/
headers:
Connection:close
Content-Type:text/html
Server:Apache/2.2.8 (Ubuntu) DAV/2 PHP/5.2.4-2ubuntu5.3 with Suhosin-Patch mod_ruby/1.2.6 Ruby/1.8.6(2007-09-24) mod_ssl/2.2.8 OpenSSL/0.9.8g
X-Cache:MISS from server03.abangadek.com
X-Powered-By:PHP/5.2.4-2ubuntu5.3
http://video.forbes.com/embedvideo/?format=frame&height=515&width=336&mode=render&networklink=1
headers:
Connection:close
Content-Language:en-US
Content-Type:text/html;charset=ISO-8859-1
Server:Apache-Coyote/1.1
I would like to know what I can look for in these responses that, like the Transfer-encoding header did for the first URL, gives a clue to reading the entire response deterministically so that the call to my method can be avoided.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
从给出的 URL 来看,您似乎正在查看 HTTP 分块传输编码,这允许服务器在知道总长度之前开始传输响应,同时仍然允许客户端可靠地确定响应的结束。
另请参阅 RFC 2616 第 3.6.1 节。
From the URL given it seems you are looking at HTTP Chunked Transfer Encoding, which allows the server to start transmitting the response before total length is known while still allowing the client to reliably determine end of the response.
Also see RFC 2616, section 3.6.1.