如何在 vb.net 中解析 HTML

发布于 2024-07-13 01:38:03 字数 120 浏览 13 评论 0原文

我想知道是否有一种简单的方法可以在 vb.net 中解析 HTML。 我知道 HTML 不是 XML 的严格子集,但如果可以这样处理那就太好了。 有没有什么可以让我在 VB.net 中以类似 XML 的方式解析 HTML?

I would like to know if there is a simple way to parse HTML in vb.net.
I know that HTML is not sctrict subset of XML, but it would be nice if it could be treated that way. Is there anything out there that would let me parse HTML in an XML-like way in VB.net?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

奈何桥上唱咆哮 2024-07-20 01:38:03

'也添加程序引用:Microsoft.mshtml

'然后在页面上:

Imports mshtml

Function parseMyHtml(ByVal htmlToParse$) As String
    Dim htmlDocument As IHTMLDocument2 = New HTMLDocumentClass()
    htmlDocument.write(htmlToParse)
    htmlDocument.close()

    Dim allElements As IHTMLElementCollection = htmlDocument.body.all

    Dim allInputs As IHTMLElementCollection = allElements.tags("a")
    Dim element As IHTMLElement
    For Each element In allInputs
        element.title = element.innerText
    Next

    Return htmlDocument.body.innerHTML
End Function

此处:

'add prog ref too: Microsoft.mshtml

'then on the page:

Imports mshtml

Function parseMyHtml(ByVal htmlToParse$) As String
    Dim htmlDocument As IHTMLDocument2 = New HTMLDocumentClass()
    htmlDocument.write(htmlToParse)
    htmlDocument.close()

    Dim allElements As IHTMLElementCollection = htmlDocument.body.all

    Dim allInputs As IHTMLElementCollection = allElements.tags("a")
    Dim element As IHTMLElement
    For Each element In allInputs
        element.title = element.innerText
    Next

    Return htmlDocument.body.innerHTML
End Function

As found here:

¢好甜 2024-07-20 01:38:03

我喜欢 Html Agility pack - 它对开发人员非常友好,免费且源代码可用。

I like Html Agility pack - it's very developer friendly, free and source code is available.

星軌x 2024-07-20 01:38:03

不要使用 Agility Pack,只需使用 mshtml 库来访问 dom,这就是 ie 使用的方法,非常适合浏览 HTML 元素。

如果你问我的话,敏捷包是令人讨厌的,而且是不必要的 hackie,mshtml 是正确的选择。 到msdn上查一下。

Don't use agility pack, just use mshtml library to access the dom, this is what ie uses and is great for going through HTML elements.

Agility pack is nasty and unnecessarily hackie if you ask me, mshtml is the way to go. Look it up on msdn.

陌路终见情 2024-07-20 01:38:03

如果您的 HTML 遵循 XHTML 标准,则可以使用 System.XML 命名空间类进行大量解析和处理。

另一方面,如果您要解析的内容是 Web 开发人员所说的“标签汤”,那么您将需要一个第三方解析器,例如 HTML 敏捷包

如果您试图弄清楚浏览器如何解释您的 HTML,因为每个浏览器解析标签汤的方式略有不同,那么这可能只是您问题的部分解决方案。

If your HTML follows XHTML standards, you can do a lot of the parsing and processing using the System.XML namespace classes.

If, on the other hand, if what you're parsing is what web developers refer to as "tag soup," you'll need a third-party parser like HTML Agility Pack.

This may be only a partial solution to your problem if you're trying to figure out how a browser will interpret your HTML as each browser parses tag soup slightly differently.

梦醒灬来后我 2024-07-20 01:38:03

结构是否良好? 如果 HTML 实际上格式良好,那么它可以被解析为 XML。 如果它是标签汤并且存在未封闭的元素,我认为您将不得不寻找第三方解决方案。

Is it well formed? If the HTML is in fact well formed then it can be parsed as XML. If it is tag soup and there are unclosed elements and such I would think you would have to hunt around for a third-party solution.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文