使用 HTMLAgilityPack c# 按类名删除元素

发布于 2024-10-20 16:07:04 字数 1210 浏览 4 评论 0原文

我正在使用 html 敏捷包将 html 文档的内容读入字符串等。完成此操作后,我想按其类删除该内容中的某些元素,但是我遇到了一个问题。

我的 Html 看起来像这样:

<div id="wrapper">
    <div class="maincolumn" >
        <div class="breadCrumbContainer">
            <div class="breadCrumbs">
            </div>
        </div>

        <div class="seo_list">
            <div class="seo_head">Header</div>
        </div>

Content goes here...
</div>

现在,我使用了 xpath 选择器来获取 中的所有内容,并使用了 InnerHtml 属性,如下所示:

            node = doc.DocumentNode.SelectSingleNode("//div[@id='wrapper']");
            if (node != null)
            {
                pageContent = node.InnerHtml;
            }

从这一点开始,我想删除带有“breadCrumbContainer”类的 div,但是当使用下面的代码,我收到错误:“在集合中找不到节点“””

            node = doc.DocumentNode.SelectSingleNode("//div[@id='wrapper']");
            node = node.RemoveChild(node.SelectSingleNode("//div[@class='breadCrumbContainer']"));

            if (node != null)
            {
                pageContent = node.InnerHtml;
            }

任何人都可以解释一下吗?我对 Xpath 很陌生,对 HtmlAgility 库也很陌生。

谢谢,

戴夫

I'm using the html agility pack to read the contents of my html document into a string etc. After this is done, I would like to remove certian elements in that content by their class, however I am stumbling upon a problem.

My Html looks like this:

<div id="wrapper">
    <div class="maincolumn" >
        <div class="breadCrumbContainer">
            <div class="breadCrumbs">
            </div>
        </div>

        <div class="seo_list">
            <div class="seo_head">Header</div>
        </div>

Content goes here...
</div>

Now, I have used an xpath selector to get all the content within the and used the InnerHtml property like so:

            node = doc.DocumentNode.SelectSingleNode("//div[@id='wrapper']");
            if (node != null)
            {
                pageContent = node.InnerHtml;
            }

From this point, I would like to remove the div with the class of "breadCrumbContainer", however when using the code below, I get the error: "Node "" was not found in the collection"

            node = doc.DocumentNode.SelectSingleNode("//div[@id='wrapper']");
            node = node.RemoveChild(node.SelectSingleNode("//div[@class='breadCrumbContainer']"));

            if (node != null)
            {
                pageContent = node.InnerHtml;
            }

Can anyone shed some light on this please? I'm quite new to Xpath, and really new to the HtmlAgility library.

Thanks,

Dave

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

梦幻的味道 2024-10-27 16:07:04

这是因为RemoveChild只能删除直接子级,而不能删除孙级。试试这个:

    HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='breadCrumbContainer']");
    node.ParentNode.RemoveChild(node);

It's because RemoveChild can only remove a direct child, not a grand child. Try this instead:

    HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='breadCrumbContainer']");
    node.ParentNode.RemoveChild(node);
做个少女永远怀春 2024-10-27 16:07:04

对于 XSLT 来说,这是一个超级简单的任务:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
  "div[@class='breadCrumbContainer'
     and
       ancestor::div[@id='wrapper']
      ]
  "/>
</xsl:stylesheet>

当此转换应用于提供的 XML 文档时(添加另一个

并包装到 顶部元素中,使其更具挑战性和现实性):

<html>
 <div id="wrapper">
    <div class="maincolumn" >
        <div class="breadCrumbContainer">
            <div class="breadCrumbs"></div>
        </div>
        <div class="seo_list">
            <div class="seo_head">Header</div>
        </div>  Content goes here...
    </div>
 </div>
 <div>
   Something else here
 </div>
</html>

产生了想要的正确结果:

<html>
  <div id="wrapper">
    <div class="maincolumn">
      <div class="seo_list">
        <div class="seo_head">Header</div>
      </div>  Content goes here...
    </div>
  </div>
  <div>
   Something else here
 </div>
</html>

This is a super-simple task for XSLT:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
  "div[@class='breadCrumbContainer'
     and
       ancestor::div[@id='wrapper']
      ]
  "/>
</xsl:stylesheet>

when this transformation is applied on the provided XML document (with added another <div> and wrapped into an <html> top element to make it more challenging and realistic):

<html>
 <div id="wrapper">
    <div class="maincolumn" >
        <div class="breadCrumbContainer">
            <div class="breadCrumbs"></div>
        </div>
        <div class="seo_list">
            <div class="seo_head">Header</div>
        </div>  Content goes here...
    </div>
 </div>
 <div>
   Something else here
 </div>
</html>

the wanted, correct result is produced:

<html>
  <div id="wrapper">
    <div class="maincolumn">
      <div class="seo_list">
        <div class="seo_head">Header</div>
      </div>  Content goes here...
    </div>
  </div>
  <div>
   Something else here
 </div>
</html>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文