如何设置 HtmlAgilityPack HtmlDocument 的编码

发布于 2024-12-05 21:23:46 字数 291 浏览 0 评论 0原文

这是我的代码:

HtmlWeb hw = new HtmlWeb();

HtmlAgilityPack.HtmlDocument htmlDoc = hw.Load(@"www.SomeForeignURL.com");

返回的 HTML 页面包含看起来奇怪的字符,我想将返回的文档的编码指定为 UTF-8。我该如何解决这个问题?

(尝试像这样加载文档: htmlDoc.Load("url", Encoding.UTF8) 但它返回一个错误,指出不支持 URI 或类似的内容。)

Heres my code:

HtmlWeb hw = new HtmlWeb();

HtmlAgilityPack.HtmlDocument htmlDoc = hw.Load(@"www.SomeForeignURL.com");

The returned HTML page includes characters that look strange, Id like to specify the encoding for the returned document to UTF-8. How can i solve this?

(tried loading the document like so: htmlDoc.Load("url", Encoding.UTF8) but it returned an error saying that the URI is not supported or something like that.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

和影子一齐双人舞 2024-12-12 21:23:46

它可能没有那么有用,但我遇到了一个问题,其中 Load() 方法默默失败,可能无法检测格式,并通过首先将文件加载到字符串中来解决它(我使用另一个函数来下载文件)。然后我使用了 LoadHTML() 方法。我迟到了一年才回答,而且我使用的是 powershell 而不是 C#,但提示可能仍然适用。

请参阅倒数第二行:它只是将文件读入字符串并传递给 LoadHTML()

#  http://www.leeholmes.com/blog/2010/03/05/html-agility-pack-rocks-your-screen-

scraping-world/
function DownloadFile {
Param([Parameter(mandatory=$true)]$source , 
    [Parameter(mandatory=$true)]$destination) 


    $wc = New-Object System.Net.WebClient
    $wc.DownloadFile($source, $destination)
}

$erroractionpreference = 'stop'
Set-Strictmode -version 2

DownloadFile  "http://someurl/index.php?action=searchplayer&server=0&player=%" "$pwd\all.php"

$types = add-type -Path .\agilitypack\HtmlAgilityPack.dll
$doc = New-Object HtmlAgilityPack.HtmlDocument 
$doc.LoadHtml([string](get-content .\all.html))
$doc

Its probably not that helpfull but I ran into a problem where the Load() method fails silently, probably failing to detect the format, and worked around it by loading the file into a string first (I used another function to dowload the file). I then used the LoadHTML() method. I'm a year late answering, and I'm using powershell not C# but the hint might still apply.

See second-last line : it simply reads the file into a string and passess to LoadHTML()

#  http://www.leeholmes.com/blog/2010/03/05/html-agility-pack-rocks-your-screen-

scraping-world/
function DownloadFile {
Param([Parameter(mandatory=$true)]$source , 
    [Parameter(mandatory=$true)]$destination) 


    $wc = New-Object System.Net.WebClient
    $wc.DownloadFile($source, $destination)
}

$erroractionpreference = 'stop'
Set-Strictmode -version 2

DownloadFile  "http://someurl/index.php?action=searchplayer&server=0&player=%" "$pwd\all.php"

$types = add-type -Path .\agilitypack\HtmlAgilityPack.dll
$doc = New-Object HtmlAgilityPack.HtmlDocument 
$doc.LoadHtml([string](get-content .\all.html))
$doc
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文