如何在 vb.net 中解析 HTML
我想知道是否有一种简单的方法可以在 vb.net 中解析 HTML。 我知道 HTML 不是 XML 的严格子集,但如果可以这样处理那就太好了。 有没有什么可以让我在 VB.net 中以类似 XML 的方式解析 HTML?
I would like to know if there is a simple way to parse HTML in vb.net.
I know that HTML is not sctrict subset of XML, but it would be nice if it could be treated that way. Is there anything out there that would let me parse HTML in an XML-like way in VB.net?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
'也添加程序引用:Microsoft.mshtml
'然后在页面上:
如此处:
'add prog ref too: Microsoft.mshtml
'then on the page:
As found here:
我喜欢 Html Agility pack - 它对开发人员非常友好,免费且源代码可用。
I like Html Agility pack - it's very developer friendly, free and source code is available.
不要使用 Agility Pack,只需使用 mshtml 库来访问 dom,这就是 ie 使用的方法,非常适合浏览 HTML 元素。
如果你问我的话,敏捷包是令人讨厌的,而且是不必要的 hackie,mshtml 是正确的选择。 到msdn上查一下。
Don't use agility pack, just use mshtml library to access the dom, this is what ie uses and is great for going through HTML elements.
Agility pack is nasty and unnecessarily hackie if you ask me, mshtml is the way to go. Look it up on msdn.
如果您的 HTML 遵循 XHTML 标准,则可以使用 System.XML 命名空间类进行大量解析和处理。
另一方面,如果您要解析的内容是 Web 开发人员所说的“标签汤”,那么您将需要一个第三方解析器,例如 HTML 敏捷包。
如果您试图弄清楚浏览器如何解释您的 HTML,因为每个浏览器解析标签汤的方式略有不同,那么这可能只是您问题的部分解决方案。
If your HTML follows XHTML standards, you can do a lot of the parsing and processing using the System.XML namespace classes.
If, on the other hand, if what you're parsing is what web developers refer to as "tag soup," you'll need a third-party parser like HTML Agility Pack.
This may be only a partial solution to your problem if you're trying to figure out how a browser will interpret your HTML as each browser parses tag soup slightly differently.
结构是否良好? 如果 HTML 实际上格式良好,那么它可以被解析为 XML。 如果它是标签汤并且存在未封闭的元素,我认为您将不得不寻找第三方解决方案。
Is it well formed? If the HTML is in fact well formed then it can be parsed as XML. If it is tag soup and there are unclosed elements and such I would think you would have to hunt around for a third-party solution.