将字符串或 html 文件转换为 C# HtmlDocument,无需使用 WebBrowser 或 HAP
我能找到的唯一解决方案是使用:
mshtml.HTMLDocument htmldocu = new mshtml.HTMLDocument();
htmldocu .createDocumentFromUrl(url, "");
并且我不确定性能,它应该比在 Web 浏览器中加载 html 文件然后从那里获取 HtmlDocument 更好。无论如何,该代码在我的机器上不起作用。当应用程序尝试执行第二行时,它崩溃了。
有没有人有办法有效地实现这一点或任何其他方式?
注意:请理解我需要 HtmlDocument 对象来进行 DOM 处理。我不需要 html 字符串。
The only solution I could find was using:
mshtml.HTMLDocument htmldocu = new mshtml.HTMLDocument();
htmldocu .createDocumentFromUrl(url, "");
and I am not sure about the performance, it should be better than loading the html file in a WebBrowser and then grab the HtmlDocument from there. Anyhow, that code does not work on my machine. The application crashes when it tries to execute the second line.
Has anyone an approach to achieve this efficiently or any other way?
NOTE: Please understand that I need the HtmlDocument object for DOM processing. I do not need the html string.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
使用
WebClient
对象的DownloadString
方法。例如,在上面的示例中,执行后,
reply
将包含端点http://www.google.com
的 html 标记。WebClient.DownloadString MSDN
Use the
DownloadString
method of theWebClient
object. e.g.In the above example, after executed,
reply
will contain the html markup of the endpointhttp://www.google.com
.WebClient.DownloadString MSDN
为了回答您四年前(在我发布此答案时)的实际问题,我提供了一个可行的解决方案。如果您找到另一种方法来做到这一点,我也不会感到惊讶,所以这主要适用于寻找类似解决方案的其他人。但请记住,这被认为
HtmlDocument
的实际使用),此外,请记住,
HtmlDocument
实际上只是mshtml.HTMLDocument2
的包装器,因此它技术上比只是直接使用 COM 包装器,但我完全理解用例只是为了便于编码。如果您对以上所有内容都很满意,那么以下是如何实现您想要的。
要使用它:
我没有直接测试此代码 - 我已将其从旧的 Powershell 脚本翻译而来,该脚本需要与您请求的功能相同的功能。如果失败,请告诉我。功能已经存在,但代码可能需要非常小的调整才能工作。
In an attempt to answer your actual question from four years ago (at the time of me posting this answer), I'm providing a working solution. I wouldn't be surprised if you found another way to do this, either, so this is mostly for other people searching for a similar solution. Keep in mind, however, that this is considered
HtmlDocument
)Additionally, keep in mind that
HtmlDocument
is really just a wrapper formshtml.HTMLDocument2
, so it is technically slower than just using a COM wrapper directly, but I completely understand the use case simply for ease of coding.If you're cool with all of the above, here's how to accomplish what you want.
To use it:
I have not tested this code directly -- I have translated it from an old Powershell script that needed the same functionality you're requesting. If it fails, let me know. The functionality is there but the code might need very minor tweaking to get working.