使用 Agility Pack 从 HTML 中删除所有已指定类的元素

发布于 2024-11-05 04:24:57 字数 979 浏览 2 评论 0原文

我试图选择具有给定类的所有元素并将它们从 HTML 字符串中删除。

这是我到目前为止所拥有的，尽管源代码清楚地显示了具有该类名称的 4 个元素，但它似乎没有删除任何内容。

// Filter page HTML to display required content
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

// filePath is a path to a file containing the html
htmlDoc.LoadHtml(pageHTML);

// ParseErrors is an ArrayList containing any errors from the Load statement);
if (!htmlDoc.ParseErrors.Any())
{
    // Remove all elements marked with pdf-ignore class
    HtmlNodeCollection nodes = htmlDoc.DocumentNode.SelectNodes("//body[@class='pdf-ignore']");

    // Remove the collection from above
    foreach (var node in nodes)
    {
        node.Remove();
    }
}

编辑：只是为了澄清文档正在解析并且 SelectNodes 行正在被命中，只是不返回任何内容。

这是 html 的一个片段：

<input type=\"submit\" name=\"ctl00$MainContent$PrintBtn\" value=\"Print Shotlist\" onclick=\"window.print();\" id=\"MainContent_PrintBtn\" class=\"pdf-ignore\">

原文

I'm trying to select all elements that have a given class and remove them from a HTML string.

This is what I have so far it doesn't seem to remove anything although the source shows clearly 4 elements with that class name.

// Filter page HTML to display required content
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

// filePath is a path to a file containing the html
htmlDoc.LoadHtml(pageHTML);

// ParseErrors is an ArrayList containing any errors from the Load statement);
if (!htmlDoc.ParseErrors.Any())
{
    // Remove all elements marked with pdf-ignore class
    HtmlNodeCollection nodes = htmlDoc.DocumentNode.SelectNodes("//body[@class='pdf-ignore']");

    // Remove the collection from above
    foreach (var node in nodes)
    {
        node.Remove();
    }
}

EDIT: Just to clarify the document is parsing and the SelectNodes line is being hit, just not returning anything.

Here is a snippet of the html:

<input type=\"submit\" name=\"ctl00$MainContent$PrintBtn\" value=\"Print Shotlist\" onclick=\"window.print();\" id=\"MainContent_PrintBtn\" class=\"pdf-ignore\">

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

清风疏影 2024-11-12 04:24:57

编辑：在更新后的答案中，您将 HTML 字符串的一部分发布为元素声明，但您试图匹配 ; 类 pdf-ignore 元素（根据您的表达式 //body[@class='pdf-ignore']）。

如果您想将文档中的所有元素与此类匹配，您应该使用：

var nodes = htmlDoc.DocumentNode.SelectNodes("//*[contains(@class,'pdf-ignore')]");

code 来获取节点。这将匹配具有指定类名的所有元素。

除了一个细节之外，您的代码似乎是正确的：条件 htmlDoc.ParseErrors == null。仅当 ParseErrors 属性（这是 IEnumerable 的类型）为 null 时才选择并删除节点，但实际上如果没有错误发现这个属性返回一个空列表。因此，将代码更改为：

if (!htmlDoc.ParseErrors.Any())
{
    // some logic here
}

应该可以解决问题。

EDIT: in your updated answer you posted a part of the HTML string an <input> element declaration, but you're trying to match a <body> element with the class pdf-ignore (according to your expression //body[@class='pdf-ignore']).

If you want to match all the elements from the document with this class you should use:

var nodes = htmlDoc.DocumentNode.SelectNodes("//*[contains(@class,'pdf-ignore')]");

code to get your nodes. This will match all the elements with the class name specified.

Your code is seems to be correct except the one detail: the condition htmlDoc.ParseErrors == null. You select and remove nodes ONLY if the ParseErrors property (which is a type of IEnumerable<HtmlParseError>) is null, but actually if no errors found this property returns an empty list. So changing your code to:

if (!htmlDoc.ParseErrors.Any())
{
    // some logic here
}

should solve the issue.

回复收藏 0 原文

最笨的告白 2024-11-12 04:24:57

您的 xpath 可能不匹配：您是否尝试过 "//div[class='pdf-ignore']" （没有 "@"）？

回复收藏 0 原文

~没有更多了~

关于作者

不甘平庸

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

使用 Agility Pack 从 HTML 中删除所有已指定类的元素

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

使用 Agility Pack 从 HTML 中删除所有已指定类的元素

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。