使用 Xpath 和 HtmlAgilityPack 查找内部文本包含特定单词的所有元素
我正在尝试使用 HtmlAgilityPack 和 Xpath 与 C# (.NET 4) 构建一个简单的搜索引擎。 我想找到包含用户定义搜索词的每个节点,但我似乎无法正确获取 XPath。 例如:
<HTML>
<BODY>
<H1>Mr T for president</H1>
<div>We believe the new president should be</div>
<div>the awsome Mr T</div>
<div>
<H2>Mr T replies:</H2>
<p>I pity the fool who doesn't vote</p>
<p>for Mr T</p>
</div>
</BODY>
</HTML>
如果指定的搜索词是“Mr T”,我需要以下节点:
、第二个
、< ;H2>
和第二个
。 我尝试过 doc.DocumentNode.SelectNodes("//text()[contains(., "+ searchword +")]"); 的多种变体,但我似乎总是以每一个变体结束整个 DOM 中的节点。
、< ;H2>
和第二个
。 我尝试过 doc.DocumentNode.SelectNodes("//text()[contains(., "+ searchword +")]"); 的多种变体,但我似乎总是以每一个变体结束整个 DOM 中的节点。
任何让我朝着正确方向前进的提示将非常感激。
I am trying to build a simple search-engine using HtmlAgilityPack and Xpath with C# (.NET 4).
I want to find every node containing a userdefined searchword, but I can't seem to get the XPath right.
For Example:
<HTML>
<BODY>
<H1>Mr T for president</H1>
<div>We believe the new president should be</div>
<div>the awsome Mr T</div>
<div>
<H2>Mr T replies:</H2>
<p>I pity the fool who doesn't vote</p>
<p>for Mr T</p>
</div>
</BODY>
</HTML>
If the specified searchword is "Mr T" I'd want the following nodes: <H1>
, The second <div>
, <H2>
and the second <p>
.
I have tried numerous variants of doc.DocumentNode.SelectNodes("//text()[contains(., "+ searchword +")]");
but I always seem to wind up with every single node in the entire DOM.
Any hints to get me in the right direction would be very appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
使用:
这将选择 XML 文档中具有包含字符串
'Mr T'
的文本节点子级的所有元素。这也可以写得更短:
这选择包含字符串
'Mr T'
的任何文本节点的父节点。Use:
This selects all elements in the XML document that have a text-node child which contains the string
'Mr T'
.This can also be written shorter as:
This selects the parent(s) of any text node that contains the string
'Mr T'
.根据Xpath,如果你想查找特定的关键字,你需要遵循以下格式(“关键字”是你想要搜索的单词):
您必须遵循与上面在 C# 中相同的格式,
keyword
是您调用的字符串变量:According to Xpath, if you want to find a specific keyword you need to follow the format ("keyword" is the word you like to search) :
You have to follow the same format as above in C#,
keyword
is the string variable you call:使用以下命令:
这将选择第一个文本子级 (
text()[1]
) 包含searchword
的所有元素 (*
)。Use the following:
This selects all elements (
*
) whose first text child (text()[1]
) contains thesearchword
.不区分大小写的解决方案:
var xpathForFindText =
"//*[text()[包含(翻译(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), '" + lowerFocusKwd + "')]]";
var result=doc.DocumentNode.SelectNodes(xpathForFindText);
注意:
要小心,因为 lowerFocusKwd 不能包含以下字符,因为 xpath 的格式将是错误的:
'
Case-insensitive solution:
var xpathForFindText =
"//*[text()[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), '" + lowerFocusKwd + "')]]";
var result=doc.DocumentNode.SelectNodes(xpathForFindText);
Note:
Be careful, because the lowerFocusKwd must not contain the following character, because the xpath will be in bad format:
'