如何设置 HtmlAgilityPack HtmlDocument 的编码
这是我的代码:
HtmlWeb hw = new HtmlWeb();
HtmlAgilityPack.HtmlDocument htmlDoc = hw.Load(@"www.SomeForeignURL.com");
返回的 HTML 页面包含看起来奇怪的字符,我想将返回的文档的编码指定为 UTF-8。我该如何解决这个问题?
(尝试像这样加载文档: htmlDoc.Load("url", Encoding.UTF8) 但它返回一个错误,指出不支持 URI 或类似的内容。)
Heres my code:
HtmlWeb hw = new HtmlWeb();
HtmlAgilityPack.HtmlDocument htmlDoc = hw.Load(@"www.SomeForeignURL.com");
The returned HTML page includes characters that look strange, Id like to specify the encoding for the returned document to UTF-8. How can i solve this?
(tried loading the document like so: htmlDoc.Load("url", Encoding.UTF8) but it returned an error saying that the URI is not supported or something like that.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
它可能没有那么有用,但我遇到了一个问题,其中 Load() 方法默默失败,可能无法检测格式,并通过首先将文件加载到字符串中来解决它(我使用另一个函数来下载文件)。然后我使用了 LoadHTML() 方法。我迟到了一年才回答,而且我使用的是 powershell 而不是 C#,但提示可能仍然适用。
请参阅倒数第二行:它只是将文件读入字符串并传递给 LoadHTML()
Its probably not that helpfull but I ran into a problem where the Load() method fails silently, probably failing to detect the format, and worked around it by loading the file into a string first (I used another function to dowload the file). I then used the LoadHTML() method. I'm a year late answering, and I'm using powershell not C# but the hint might still apply.
See second-last line : it simply reads the file into a string and passess to LoadHTML()