C# 未从 HttpWebResponse 获得正确的响应。编码？

发布于 2024-10-04 05:12:02 字数 1354 浏览 3 评论 0原文

我正在尝试使用下面的代码获取一些网页：

    public static string FetchPage(string url)
    {

         HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);

        req.Method = "GET";

        req.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.0; sv-SE; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12 (.NET CLR 3.5.30729";
        req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
        req.Headers.Add("Accept-Language", "sv-se,sv;q=0.8,en-us;q=0.5,en;q=0.3");
        req.Headers.Add("Accept-Encoding", "gzip,deflate");
        req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");
        req.Headers.Add("Keep-Alive", "115");
        req.Headers.Add("Cache-Control: max-age=0");
        req.AllowAutoRedirect = true;

        req.IfModifiedSince = DateTime.Now;

        using (HttpWebResponse resp = (HttpWebResponse)req.GetResponse())
        {
           using (Stream resStream = resp.GetResponseStream())
           {
              StreamReader reader = new StreamReader(resStream);
              return reader.ReadToEnd();
            }
        }
    }

某些页面可以工作（W3C、example.com），而我尝试过的大多数其他页面则不能（BBC.co.uk、CNN.com 等）。 Wireshark 显示我得到了正确的响应。

我尝试将阅读器的编码设置为响应的预期编码（CNN - utf8）以及每种可能的组合，但我没有运气。

我在这里错过了什么？

我的回复的第一个字节始终是“1f ef bf bd”，如果您能够据此判断一些内容。

原文

I'm trying to fetch some webpages using the code below:

    public static string FetchPage(string url)
    {

         HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);

        req.Method = "GET";

        req.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.0; sv-SE; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12 (.NET CLR 3.5.30729";
        req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
        req.Headers.Add("Accept-Language", "sv-se,sv;q=0.8,en-us;q=0.5,en;q=0.3");
        req.Headers.Add("Accept-Encoding", "gzip,deflate");
        req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");
        req.Headers.Add("Keep-Alive", "115");
        req.Headers.Add("Cache-Control: max-age=0");
        req.AllowAutoRedirect = true;

        req.IfModifiedSince = DateTime.Now;

        using (HttpWebResponse resp = (HttpWebResponse)req.GetResponse())
        {
           using (Stream resStream = resp.GetResponseStream())
           {
              StreamReader reader = new StreamReader(resStream);
              return reader.ReadToEnd();
            }
        }
    }

Some pages work (W3C, example.com) while most others I've tried do not (BBC.co.uk, CNN.com, etc). Wireshark shows that I'm getting a proper reponse.

I've tried setting the encoding of the reader to the expected encoding of the response (CNN - utf8) as well as every possible combination but I have had no luck.

What am I missing out on here?

The first bytes of my response are always "1f ef bf bd" if you're able to tell something based on that.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

心凉怎暖 2024-10-11 05:12:02

我怀疑最可能的解释是您正在获取压缩数据而不是解压缩它。尝试使用流过滤器来压缩/解压缩它。有关详细信息，请参阅 Rick Strahl 的博客文章。

回复收藏 0 原文

撩心不撩汉 2024-10-11 05:12:02

当省略“Accept-Encoding”标头时，加载 http://bbc.co.uk 对我有用：

req.Headers.Add("Accept-Encoding", "gzip,deflate");

Loading http://bbc.co.uk worked for me when leaving out the "Accept-Encoding" header:

req.Headers.Add("Accept-Encoding", "gzip,deflate");

回复收藏 0 原文

~没有更多了~

关于作者

烛影斜

暂无简介

文章

24 人气

关注发私信

陈静维

文章 0 评论 0

关注

深海里的那抹蓝

文章 0 评论 0

关注

给妤﹃绝世温柔

文章 0 评论 0

关注

谢绝鈎搭

文章 0 评论 0

关注

时光清浅

文章 0 评论 0

关注

温柔女人霸气范

文章 0 评论 0

友情链接

文江博客

C# 未从 HttpWebResponse 获得正确的响应。编码？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

陈静维

深海里的那抹蓝

给妤﹃绝世温柔

谢绝鈎搭

时光清浅

温柔女人霸气范

友情链接

C# 未从 HttpWebResponse 获得正确的响应。编码？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

陈静维

深海里的那抹蓝

给妤﹃绝世温柔

谢绝鈎搭

时光清浅

温柔女人霸气范

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。