将 XPath 与 PHP 的 SimpleXML 结合使用来查找包含字符串的节点
我尝试将 SimpleXML 与 XPath 结合使用来查找包含特定字符串的节点。
<?php
$xhtml = <<<EOC
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>Test</title>
</head>
<body>
<p>Find me!</p>
<p>
<br />
Find me!
<br />
</p>
</body>
</html>
EOC;
$xml = simplexml_load_string($xhtml);
$xml->registerXPathNamespace('xhtml', 'http://www.w3.org/1999/xhtml');
$nodes = $xml->xpath("//*[contains(text(), 'Find me')]");
echo count($nodes);
预期输出:2 实际输出:1
当我将第二段的 xhtml 更改为
<p>
Find me!
<br />
</p>
then 时,它会按预期工作。我的 XPath 表达式必须如何匹配包含“Find me”的所有节点,无论它们位于何处?
使用 PHP 的 DOM-XML 是一种选择,但不是必需的。
提前致谢!
I try to use SimpleXML in combination with XPath to find nodes which contain a certain string.
<?php
$xhtml = <<<EOC
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>Test</title>
</head>
<body>
<p>Find me!</p>
<p>
<br />
Find me!
<br />
</p>
</body>
</html>
EOC;
$xml = simplexml_load_string($xhtml);
$xml->registerXPathNamespace('xhtml', 'http://www.w3.org/1999/xhtml');
$nodes = $xml->xpath("//*[contains(text(), 'Find me')]");
echo count($nodes);
Expected output: 2
Actual output: 1
When I change the xhtml of the second paragraph to
<p>
Find me!
<br />
</p>
then it works like expected. How has my XPath expression has to look like to match all nodes containing 'Find me' no matter where they are?
Using PHP's DOM-XML is an option, but not desired.
Thank's in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这取决于你想做什么。您可以选择在其任何后代中包含“Find me”的所有
元素,
这将返回重复项,因此您不指定节点类型,它将返回
和
也是如此。
或者您可能想要任何具有包含“Find me”的子(不是后代)文本节点的节点,
该节点不会返回
或
;
。我忘了提到
.
代表节点的全部文本内容。text()
用于检索文本节点[的节点集]。您的表达式contains(text(), 'Find me')
的问题在于contains()
仅适用于字符串,不适用于节点集,因此它会转换text ()
到第一个节点的值,这就是为什么删除第一个
使其起作用。It depends on what you want to do. You could select all the
<p/>
elements that contain "Find me" in any of their descendants withThis will return duplicates and so you don't specify the kind of nodes then it will return
<body/>
and<html/>
as well.Or perhaps you want any node which has a child (not a descendant) text node that contains "Find me"
This one will not return
<html/>
or<body/>
.I forgot to mention that
.
represents the whole text content of a node.text()
is used to retrieve [a nodeset of] text nodes. The problem with your expressioncontains(text(), 'Find me')
is thatcontains()
only works on strings, not nodesets and therefore it convertstext()
to the value of the first node, which is why removing the first<br/>
makes it work.呃,嗯?但感谢@Jordy 的快速回答。
首先,这是 DOM-XML,这不是我们想要的,因为我的脚本中的其他所有内容都是使用 SimpleXML 完成的。
其次,为什么要翻译为大写并搜索未更改的字符串“Find me”? '搜索'FIND ME'实际上会给出结果。
但你给我指明了正确的方向:
成功!
Err, umm? But thanks @Jordy for the quick answer.
First, that's DOM-XML, which is not desired, since everything else in my script is done with SimpleXML.
Second, why do you translate to uppercase and search for an unchanged string 'Find me'? 'Searching for 'FIND ME' would actually give a result.
But you pointed me towards the right direction:
does the trick!
我正在寻找一种方法来查找具有精确值“Find Me”的节点是否存在,这似乎可行。
I was looking for a way to find whether a node with exact value "Find Me" exists and this seemed to work.