使用 Xpath 和 HtmlAgilityPack 查找内部文本包含特定单词的所有元素

发布于 2024-12-27 22:09:19 字数 815 浏览 0 评论 0原文

我正在尝试使用 HtmlAgilityPack 和 Xpath 与 C# (.NET 4) 构建一个简单的搜索引擎。我想找到包含用户定义搜索词的每个节点，但我似乎无法正确获取 XPath。例如：

<HTML>
 <BODY>
  <H1>Mr T for president</H1>
   <div>We believe the new president should be</div>
   <div>the awsome Mr T</div>
   <div>
    <H2>Mr T replies:</H2>
     <p>I pity the fool who doesn't vote</p>
     <p>for Mr T</p>
   </div>
  </BODY>
</HTML>

如果指定的搜索词是“Mr T”，我需要以下节点：

、第二个
、`< ;H2>` 和第二个
。我尝试过 doc.DocumentNode.SelectNodes("//text()[contains(., "+ searchword +")]"); 的多种变体，但我似乎总是以每一个变体结束整个 DOM 中的节点。

任何让我朝着正确方向前进的提示将非常感激。

原文

I am trying to build a simple search-engine using HtmlAgilityPack and Xpath with C# (.NET 4).
I want to find every node containing a userdefined searchword, but I can't seem to get the XPath right.
For Example:

<HTML>
 <BODY>
  <H1>Mr T for president</H1>
   <div>We believe the new president should be</div>
   <div>the awsome Mr T</div>
   <div>
    <H2>Mr T replies:</H2>
     <p>I pity the fool who doesn't vote</p>
     <p>for Mr T</p>
   </div>
  </BODY>
</HTML>

If the specified searchword is "Mr T" I'd want the following nodes: <H1>, The second <div>, <H2> and the second <p>.
I have tried numerous variants of doc.DocumentNode.SelectNodes("//text()[contains(., "+ searchword +")]"); but I always seem to wind up with every single node in the entire DOM.

Any hints to get me in the right direction would be very appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

内心荒芜 2025-01-03 22:09:19

使用：

//*[text()[contains(., 'Mr T')]]

这将选择 XML 文档中具有包含字符串 'Mr T' 的文本节点子级的所有元素。

这也可以写得更短：

//text()[contains(., 'Mr T')]/..

这选择包含字符串'Mr T'的任何文本节点的父节点。

Use:

//*[text()[contains(., 'Mr T')]]

This selects all elements in the XML document that have a text-node child which contains the string 'Mr T'.

This can also be written shorter as:

//text()[contains(., 'Mr T')]/..

This selects the parent(s) of any text node that contains the string 'Mr T'.

回复收藏 0 原文

十六岁半 2025-01-03 22:09:19

根据Xpath，如果你想查找特定的关键字，你需要遵循以下格式（“关键字”是你想要搜索的单词）：

//*[text()[包含(., '关键字')]]

您必须遵循与上面在 C# 中相同的格式，keyword 是您调用的字符串变量：

doc.DocumentNode.SelectNodes("//*[text()[contains(., '" + keyword + "')]]");

According to Xpath, if you want to find a specific keyword you need to follow the format ("keyword" is the word you like to search) :

//*[text()[contains(., 'keyword')]]

You have to follow the same format as above in C#, keyword is the string variable you call:

doc.DocumentNode.SelectNodes("//*[text()[contains(., '" + keyword + "')]]");

回复收藏 0 原文

黑白记忆 2025-01-03 22:09:19

使用以下命令：

doc.DocumentNode.SelectNodes("//*[contains(text()[1], " + searchword + ")]")

这将选择第一个文本子级 (text()[1]) 包含 searchword 的所有元素 (*)。

Use the following:

doc.DocumentNode.SelectNodes("//*[contains(text()[1], " + searchword + ")]")

This selects all elements (*) whose first text child (text()[1]) contains the searchword.

回复收藏 0 原文

千紇 2025-01-03 22:09:19

不区分大小写的解决方案：

var xpathForFindText =
"//*[text()[包含(翻译(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), '" + lowerFocusKwd + "')]]";

var result=doc.DocumentNode.SelectNodes(xpathForFindText);

注意：

要小心，因为 lowerFocusKwd 不能包含以下字符，因为 xpath 的格式将是错误的：

回复收藏 0 原文

~没有更多了~

关于作者

灯角

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

使用 Xpath 和 HtmlAgilityPack 查找内部文本包含特定单词的所有元素

、第二个
、`< ;H2>` 和第二个
。我尝试过 doc.DocumentNode.SelectNodes("//text()[contains(., "+ searchword +")]"); 的多种变体，但我似乎总是以每一个变体结束整个 DOM 中的节点。

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

忆悲凉

hgfg1645

qq_qLPLYi

戏舞

殊姿

﹂绝世的画

友情链接

使用 Xpath 和 HtmlAgilityPack 查找内部文本包含特定单词的所有元素

、第二个、< ;H2> 和第二个 。 我尝试过 doc.DocumentNode.SelectNodes("//text()[contains(., "+ searchword +")]"); 的多种变体，但我似乎总是以每一个变体结束整个 DOM 中的节点。

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

忆悲凉

hgfg1645

qq_qLPLYi

戏舞

殊姿

﹂绝世的画

友情链接

、第二个
、`< ;H2>` 和第二个
。我尝试过 doc.DocumentNode.SelectNodes("//text()[contains(., "+ searchword +")]"); 的多种变体，但我似乎总是以每一个变体结束整个 DOM 中的节点。

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。