将 XPath 与 PHP 的 SimpleXML 结合使用来查找包含字符串的节点

发布于 2024-09-19 03:17:45 字数 1102 浏览 5 评论 0原文

我尝试将 SimpleXML 与 XPath 结合使用来查找包含特定字符串的节点。

<?php
$xhtml = <<<EOC
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
    <head>
        <meta http-equiv="content-type" content="text/html; charset=utf-8" />
        <title>Test</title>
    </head>
    <body>
        <p>Find me!</p>
        <p>
            <br />
            Find me!
            <br />
        </p>
    </body>
</html>
EOC;

$xml = simplexml_load_string($xhtml);
$xml->registerXPathNamespace('xhtml', 'http://www.w3.org/1999/xhtml');

$nodes = $xml->xpath("//*[contains(text(), 'Find me')]");

echo count($nodes);

预期输出：2 实际输出：1

当我将第二段的 xhtml 更改为

<p>
    Find me!
    <br />
 </p>

then 时，它会按预期工作。我的 XPath 表达式必须如何匹配包含“Find me”的所有节点，无论它们位于何处？

使用 PHP 的 DOM-XML 是一种选择，但不是必需的。

提前致谢！

原文

I try to use SimpleXML in combination with XPath to find nodes which contain a certain string.

<?php
$xhtml = <<<EOC
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
    <head>
        <meta http-equiv="content-type" content="text/html; charset=utf-8" />
        <title>Test</title>
    </head>
    <body>
        <p>Find me!</p>
        <p>
            <br />
            Find me!
            <br />
        </p>
    </body>
</html>
EOC;

$xml = simplexml_load_string($xhtml);
$xml->registerXPathNamespace('xhtml', 'http://www.w3.org/1999/xhtml');

$nodes = $xml->xpath("//*[contains(text(), 'Find me')]");

echo count($nodes);

Expected output: 2
Actual output: 1

When I change the xhtml of the second paragraph to

<p>
    Find me!
    <br />
 </p>

then it works like expected. How has my XPath expression has to look like to match all nodes containing 'Find me' no matter where they are?

Using PHP's DOM-XML is an option, but not desired.

Thank's in advance!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鸵鸟症 2024-09-26 03:17:45

这取决于你想做什么。您可以选择在其任何后代中包含“Find me”的所有

元素，

//xhtml:p[contains(., 'Find me')]

这将返回重复项，因此您不指定节点类型，它将返回和也是如此。

或者您可能想要任何具有包含“Find me”的子（不是后代）文本节点的节点，

//*[text()[contains(., 'Find me')]]

该节点不会返回或 ;。

我忘了提到 . 代表节点的全部文本内容。 text() 用于检索文本节点[的节点集]。您的表达式 contains(text(), 'Find me') 的问题在于 contains() 仅适用于字符串，不适用于节点集，因此它会转换 text () 到第一个节点的值，这就是为什么删除第一个使其起作用。

It depends on what you want to do. You could select all the <p/> elements that contain "Find me" in any of their descendants with

//xhtml:p[contains(., 'Find me')]

This will return duplicates and so you don't specify the kind of nodes then it will return <body/> and <html/> as well.

Or perhaps you want any node which has a child (not a descendant) text node that contains "Find me"

//*[text()[contains(., 'Find me')]]

This one will not return <html/> or <body/>.

I forgot to mention that . represents the whole text content of a node. text() is used to retrieve [a nodeset of] text nodes. The problem with your expression contains(text(), 'Find me') is that contains() only works on strings, not nodesets and therefore it converts text() to the value of the first node, which is why removing the first <br/> makes it work.

回复收藏 0 原文

小猫一只 2024-09-26 03:17:45

呃，嗯？但感谢@Jordy 的快速回答。

首先，这是 DOM-XML，这不是我们想要的，因为我的脚本中的其他所有内容都是使用 SimpleXML 完成的。

其次，为什么要翻译为大写并搜索未更改的字符串“Find me”？ '搜索'FIND ME'实际上会给出结果。

但你给我指明了正确的方向：

$nodes = $xml->xpath("//text()[contains(., 'Find me')]");

成功！

Err, umm? But thanks @Jordy for the quick answer.

First, that's DOM-XML, which is not desired, since everything else in my script is done with SimpleXML.

Second, why do you translate to uppercase and search for an unchanged string 'Find me'? 'Searching for 'FIND ME' would actually give a result.

But you pointed me towards the right direction:

$nodes = $xml->xpath("//text()[contains(., 'Find me')]");

does the trick!

回复收藏 0 原文

知足的幸福 2024-09-26 03:17:45

我正在寻找一种方法来查找具有精确值“Find Me”的节点是否存在，这似乎可行。

$node = $xml->xpath("//text()[.='Find Me']");

I was looking for a way to find whether a node with exact value "Find Me" exists and this seemed to work.

$node = $xml->xpath("//text()[.='Find Me']");

回复收藏 0 原文

飞烟轻若梦 2024-09-26 03:17:45

    $doc = new DOMDocument();
    $doc->loadHTML($xhtml);

    $xPath = new DOMXpath($doc);
    $xPathQuery = "//text()[contains(translate(.,'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'), 'Find me')]";
    $elements = $xPath->query($xPathQuery);

    if($elements->length > 0){

    foreach($elements as $element){
        print "Found: " .$element->nodeValue."<br />";
    }}

    $doc = new DOMDocument();
    $doc->loadHTML($xhtml);

    $xPath = new DOMXpath($doc);
    $xPathQuery = "//text()[contains(translate(.,'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'), 'Find me')]";
    $elements = $xPath->query($xPathQuery);

    if($elements->length > 0){

    foreach($elements as $element){
        print "Found: " .$element->nodeValue."<br />";
    }}

回复收藏 0 原文

~没有更多了~