带有 0x00 字符的网页的截断响应
我编写了一个下载网页的程序。它适用于大多数网页,但我发现有些网页不起作用。
这些页面包含 0x00 个字符。
我可以读取该字符之前的页面内容,但不能读取之后的内容。
我使用这部分代码来读取响应:
IAsyncResult ar = null;
HttpWebResponse resp = null;
Stream responseStream = null;
String content = null;
...
resp = (HttpWebResponse)req.EndGetResponse(ar);
responseStream = resp.GetResponseStream();
StreamReader sr = new StreamReader(responseStream, Encoding.UTF8);
content = sr.ReadToEnd();
在本示例中,我使用异步请求,但我尝试使用同步请求,但遇到了同样的问题。
我也尝试了相同的结果:
HttpWebResponse resp = null;
Stream responseStream = null;
String content = new String();
...
responseStream = resp.GetResponseStream();
byte[] buffer = new byte[4096];
int bytesRead = 1;
while (bytesRead > 0)
{
bytesRead = responseStream.Read(buffer, 0, 4096);
content += Encoding.UTF8.GetString(buffer, 0, bytesRead);
}
例如,此网址 http://www.daz3d.com/i/search/searchsub?sstring=ps_tx1662b&_m=dps_tx1662b
感谢您的回复
Euyeusu
I write a program wich download web pages. It works fine for most of web pages but i have found some pages where it doesn't work.
These pages contains 0x00 characters.
I'm able to read page content until this character, but not the content after.
I use this part of code to read the response :
IAsyncResult ar = null;
HttpWebResponse resp = null;
Stream responseStream = null;
String content = null;
...
resp = (HttpWebResponse)req.EndGetResponse(ar);
responseStream = resp.GetResponseStream();
StreamReader sr = new StreamReader(responseStream, Encoding.UTF8);
content = sr.ReadToEnd();
In this example i use asynchronous request, but i try with synchronous one and i have the same probleme.
I also try this with the same result :
HttpWebResponse resp = null;
Stream responseStream = null;
String content = new String();
...
responseStream = resp.GetResponseStream();
byte[] buffer = new byte[4096];
int bytesRead = 1;
while (bytesRead > 0)
{
bytesRead = responseStream.Read(buffer, 0, 4096);
content += Encoding.UTF8.GetString(buffer, 0, bytesRead);
}
for example, the problem occurs for this url http://www.daz3d.com/i/search/searchsub?sstring=ps_tx1662b&_m=dps_tx1662b
thanks for yours responses
Euyeusu
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的问题是将接收到的内容转换为字符串,您需要删除那些
0x00
字节:Your problem is to transform received content to string, where you need to remove those
0x00
bytes:实际上是编码失败了。要解决这个问题,您必须过滤掉 0x00 字节。像这样的事情应该可以解决问题:
It is the encoding that actually fails. To get around it you'll have to filter out the 0x00 bytes. Something like this should do the trick: