C# 未从 HttpWebResponse 获得正确的响应。编码?

发布于 2024-10-04 05:12:02 字数 1354 浏览 3 评论 0原文

我正在尝试使用下面的代码获取一些网页:

    public static string FetchPage(string url)
    {

         HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);

        req.Method = "GET";

        req.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.0; sv-SE; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12 (.NET CLR 3.5.30729";
        req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
        req.Headers.Add("Accept-Language", "sv-se,sv;q=0.8,en-us;q=0.5,en;q=0.3");
        req.Headers.Add("Accept-Encoding", "gzip,deflate");
        req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");
        req.Headers.Add("Keep-Alive", "115");
        req.Headers.Add("Cache-Control: max-age=0");
        req.AllowAutoRedirect = true;

        req.IfModifiedSince = DateTime.Now;

        using (HttpWebResponse resp = (HttpWebResponse)req.GetResponse())
        {
           using (Stream resStream = resp.GetResponseStream())
           {
              StreamReader reader = new StreamReader(resStream);
              return reader.ReadToEnd();
            }
        }
    }

某些页面可以工作(W3C、example.com),而我尝试过的大多数其他页面则不能(BBC.co.uk、CNN.com 等)。 Wireshark 显示我得到了正确的响应。

我尝试将阅读器的编码设置为响应的预期编码(CNN - utf8)以及每种可能的组合,但我没有运气。

我在这里错过了什么?

我的回复的第一个字节始终是“1f ef bf bd”,如果您能够据此判断一些内容。

I'm trying to fetch some webpages using the code below:

    public static string FetchPage(string url)
    {

         HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);

        req.Method = "GET";

        req.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.0; sv-SE; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12 (.NET CLR 3.5.30729";
        req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
        req.Headers.Add("Accept-Language", "sv-se,sv;q=0.8,en-us;q=0.5,en;q=0.3");
        req.Headers.Add("Accept-Encoding", "gzip,deflate");
        req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");
        req.Headers.Add("Keep-Alive", "115");
        req.Headers.Add("Cache-Control: max-age=0");
        req.AllowAutoRedirect = true;

        req.IfModifiedSince = DateTime.Now;

        using (HttpWebResponse resp = (HttpWebResponse)req.GetResponse())
        {
           using (Stream resStream = resp.GetResponseStream())
           {
              StreamReader reader = new StreamReader(resStream);
              return reader.ReadToEnd();
            }
        }
    }

Some pages work (W3C, example.com) while most others I've tried do not (BBC.co.uk, CNN.com, etc). Wireshark shows that I'm getting a proper reponse.

I've tried setting the encoding of the reader to the expected encoding of the response (CNN - utf8) as well as every possible combination but I have had no luck.

What am I missing out on here?

The first bytes of my response are always "1f ef bf bd" if you're able to tell something based on that.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

心凉怎暖 2024-10-11 05:12:02

我怀疑最可能的解释是您正在获取压缩数据而不是解压缩它。尝试使用流过滤器来压缩/解压缩它。有关详细信息,请参阅 Rick Strahl 的博客文章

I suspect the most likely explanation is that you are getting compressed data and not uncompressing it. Try using a stream filter to deflate/unzip it. See Rick Strahl's blog article for more info.

撩心不撩汉 2024-10-11 05:12:02

当省略“Accept-Encoding”标头时,加载 http://bbc.co.uk 对我有用:

req.Headers.Add("Accept-Encoding", "gzip,deflate"); 

Loading http://bbc.co.uk worked for me when leaving out the "Accept-Encoding" header:

req.Headers.Add("Accept-Encoding", "gzip,deflate"); 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文