使用 vb.net 从字符串中删除所有 div 元素

发布于 2025-01-18 15:17:27 字数 848 浏览 0 评论 0 原文

我想从我的字符串中删除所有元素,包括 class 等属性的元素。 我已经在这里检查了,所以REGEX显然不是答案: regex匹配开放的标签,除了XHTML独立的标签

我目前已经有一些正则替代字符串中的标签(请注意,我从不对完整的html文档进行解析,如果很重要的话)并保留了内容: Regex.Replace(s,“< [^>]*(> | $)”,string.empty)。但是,我只想删除 div 标签并保留内容。

因此,我有:

<div class=""fade-content""><div><span>some  content</span></div></div>
<div>some  content</div> 

所需的输出:

<span>some  content</span>
some  content

我要去Regex Path Stil,然后尝试类似:&lt; div&gt;。*&lt; \ \/div&gt; ,但这不包括属性。

如何使用vb.net仅删除 div 元素?

I want to remove all elements, including the ones with attributes like class, from my string.
I already checked here, so regex is apparently not the answer: RegEx match open tags except XHTML self-contained tags

I currently already have something with regex that replaces all tags from a string (note, I'm never parsing a full HTML document if that matters) and preserves the content: Regex.Replace(s, "<[^>]*(>|$)", String.Empty). However, I just want the div tags removed and preserve the content.

So I have:

<div class=""fade-content""><div><span>some  content</span></div></div>
<div>some  content</div> 

Desired output:

<span>some  content</span>
some  content

I was going the regex path stil, and trying something like: <div>.*<\/div>, but that excludes divs with attributes.

How can I remove div elements only, using VB.NET?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

仲春光 2025-01-25 15:17:27

有几种方法可以做到这一点。第一个简短而简单,如下所示:

Regex.Replace(s, "</?div.*?>", String.Empty)

这是一个示例:

    's simulates your html file
    Dim s As String = "<div class="""" fade-content""""><div><span>some  content</span></div></div>" + Environment.NewLine + "<div>some  content</div>"

    'let's store the result in s1
    Dim s1 As String = Text.RegularExpressions.Regex.Replace(s, "</?div.*?>", String.Empty)

    'output
    MessageBox.Show(s1)

输出:

在此处输入图像描述

There are several ways to do this. One, short and simple, is the following one:

Regex.Replace(s, "</?div.*?>", String.Empty)

Here is an example:

    's simulates your html file
    Dim s As String = "<div class="""" fade-content""""><div><span>some  content</span></div></div>" + Environment.NewLine + "<div>some  content</div>"

    'let's store the result in s1
    Dim s1 As String = Text.RegularExpressions.Regex.Replace(s, "</?div.*?>", String.Empty)

    'output
    MessageBox.Show(s1)

Output:

enter image description here

神魇的王 2025-01-25 15:17:27

通过使用 WebBrowser 控件,无需正则表达式即可实现此目的。尝试以下操作:

ExtractDesiredData

Private Function ExtractDesiredData(html As String) As List(Of String)
    Dim result As List(Of String) = New List(Of String)()

    'create new instance
    Using wb As WebBrowser = New WebBrowser()
        wb.Navigate(New Uri("about:blank"))

        'create reference
        Dim doc As HtmlDocument = wb.Document

        'add html to document
        doc.Write(html)

        'loop through body elements
        For Each elem As HtmlElement In doc.Body.All
            If elem.TagName = "DIV" AndAlso Not elem.InnerHtml.Contains("DIV") Then
                Debug.WriteLine($"DIV elem InnerHtml: '{elem.InnerHtml}'")

                'add
                result.Add(elem.InnerHtml)
            End If
        Next
    End Using

    Return result
End Function

用法

Dim html As String = "<div class=""fade-content""><div><span>some  content</span></div></div>"
html &= vbCrLf & "<div>some  content</div>"

Dim desiredData As List(Of String) = ExtractDesiredData(html)

资源

This can be achieved without regular expressions by using a WebBrowser control. Try the following:

ExtractDesiredData:

Private Function ExtractDesiredData(html As String) As List(Of String)
    Dim result As List(Of String) = New List(Of String)()

    'create new instance
    Using wb As WebBrowser = New WebBrowser()
        wb.Navigate(New Uri("about:blank"))

        'create reference
        Dim doc As HtmlDocument = wb.Document

        'add html to document
        doc.Write(html)

        'loop through body elements
        For Each elem As HtmlElement In doc.Body.All
            If elem.TagName = "DIV" AndAlso Not elem.InnerHtml.Contains("DIV") Then
                Debug.WriteLine(
quot;DIV elem InnerHtml: '{elem.InnerHtml}'")

                'add
                result.Add(elem.InnerHtml)
            End If
        Next
    End Using

    Return result
End Function

Usage:

Dim html As String = "<div class=""fade-content""><div><span>some  content</span></div></div>"
html &= vbCrLf & "<div>some  content</div>"

Dim desiredData As List(Of String) = ExtractDesiredData(html)

Resources:

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文