使用 WebClient 和 WebRequest 之间的编码差异?
在获取一些随机的西班牙报纸索引时,我无法使用 WebRequest 正确获取变音符号,它们会产生这个奇怪的字符:�
,同时使用 WebClient
我得到了适当的回应。
为什么会出现这种差异化呢?
var client = new WebClient();
string html = client.DownloadString(endpoint);
与
WebRequest request = WebRequest.Create(endpoint);
using (WebResponse response = request.GetResponse())
{
Stream stream = response.GetResponseStream();
StreamReader reader = new StreamReader(stream);
string html = reader.ReadToEnd();
}
In getting some random spanish newspaper's index I don't get the diacriticals properly using WebRequest, they yield this weird character: �
, while downloading the response from the same uri using a WebClient
I get the appropriate response.
Why is this differentiation?
var client = new WebClient();
string html = client.DownloadString(endpoint);
vs
WebRequest request = WebRequest.Create(endpoint);
using (WebResponse response = request.GetResponse())
{
Stream stream = response.GetResponseStream();
StreamReader reader = new StreamReader(stream);
string html = reader.ReadToEnd();
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在创建流读取器时,您只是假设实体采用 UTF-8 格式,而没有显式设置编码。您应该检查
HttpWebResponse
的CharacterSet
(不是由WebResponse
基类公开),然后打开StreamReader
使用适当的编码。否则,如果它像读取 UTF-8 一样读取非 UTF-8 的内容,则会遇到在 UTF-8 中无效的八位字节序列,并且必须用 U+FFFD 替换字符 (
�
) 尽其所能。WebClient 几乎做到了这一点:
DownloadString
是一个更高级别的方法,其中WebRequest
及其派生类让您进入较低层,它有一个“发送 GET”的调用向 URI 发出请求,检查标头以查看正在使用的内容编码,以防您需要解压缩或解压缩它,查看使用的字符编码,使用该编码设置文本阅读器和流,然后调用ReadAll()
”。正常的高级大块指令与低级小块指令的优缺点适用。You're just assuming that the entity is in UTF-8 when creating your stream-reader without explicitly setting the encoding. You should examine the
CharacterSet
of theHttpWebResponse
(not exposed by theWebResponse
base class), and open theStreamReader
with the appropriate encoding.Otherwise, if it reads something that's not UTF-8 as if it was UTF-8, it'll come across octet-sequences that aren't valid in UTF-8 and have to substitute in U+FFFD replacement character (
�
) as the best it can do.WebClient does pretty much this:
DownloadString
is a higher level method, that whereWebRequest
and its derived classes let you get in lower, it has a single call for "send a GET request to the URI, examine the headers to see what content-encoding is in use, in case you need to un-gzip or de-compress it, see what character-encoding is in place, set up a text-reader with that encoding and the stream, and then callReadAll()
". The normal high-level-big-chunk-instructions vs low-level-small-chunk-instructions pros and cons apply.