命名空间元素的所有子元素的 XPath 字符串
刚刚开始使用 XPath,并将其与 PHP 的 SimpleXML
对象一起使用。现在,我正在使用 //zuq:*
在给定文档中创建带有 zuq
前缀的 SimpleXML
对象数组。但是,我希望 SimpleXML 对象能够引用所有后代,而不管命名空间如何。我尝试使用 //child::zuq:*
,但它创建的 SimpleXML
树似乎并不完整。
本质上,捕获的对象应该是整个文档中 zuq 命名空间的所有顶级对象,包含所有后代元素,无论命名空间如何,包括 zuq。
tl;dr:如何从给定文档创建 SimpleXML
对象树,其中每个 SimpleXML
根对象是给定命名空间的最高级别文档元素(例如as zuq
) 包含所述元素的所有后代,无论后代命名空间如何? XPath 不是必需的,但根据我的阅读,它似乎是最佳选择。
test.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:zuq="http://localhost/zuq">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>
<body>
<h1>Heading</h1>
<p>Paragraph</p>
<zuq:region name="myRegion">
<div class="myClass">
<h1><zuq:data name="myDataHeading" /></h1>
<p>
<zuq:data name="myDataParagraph">
<zuq:format type="trim">
<zuq:param name="length" value="200" />
<zuq:param name="append">
<span class="paragraphTrimOverflow">...</span>
</zuq:param>
</zuq:format>
</zuq:data>
</p>
</div>
</zuq:region>
</body>
</html>
$sxml = simplexml_load_file('test.html');
$sxml_zuq = $sxml->xpath('//zuq:*/descendant-or-self::node()');
print_r($sxml_zuq);
产生:
Array
(
[0] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => myRegion
)
[div] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object //I don't know why these don't contain their zuq descendants
(
)
[p] => SimpleXMLElement Object
(
)
)
)
[1] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => myRegion
)
[div] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
)
[2] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
[3] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
[4] => SimpleXMLElement Object
(
)
[5] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => myDataHeading
)
)
[6] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
[7] => SimpleXMLElement Object
(
)
[8] => SimpleXMLElement Object
(
)
[9] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => myDataParagraph
)
)
[10] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => myDataParagraph
)
)
[11] => SimpleXMLElement Object
(
[@attributes] => Array
(
[type] => trim
)
)
[12] => SimpleXMLElement Object
(
[@attributes] => Array
(
[type] => trim
)
)
[13] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => length
[value] => 200
)
)
[14] => SimpleXMLElement Object
(
[@attributes] => Array
(
[type] => trim
)
)
[15] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => append
)
[span] => ...
)
[16] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => append
)
[span] => ...
)
[17] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => paragraphTrimOverflow
)
[0] => ...
)
[18] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => paragraphTrimOverflow
)
[0] => ...
)
[19] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => append
)
[span] => ...
)
[20] => SimpleXMLElement Object
(
[@attributes] => Array
(
[type] => trim
)
)
[21] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => myDataParagraph
)
)
[22] => SimpleXMLElement Object
(
)
[23] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
[24] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => myRegion
)
[div] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
)
)
Just getting started with XPath, and using it's implementation with PHP's SimpleXML
objects. Right now I'm using //zuq:*
to create an array of SimpleXML
objects with the zuq
prefix in a given document. However, I'd like the SimpleXML
objects to reference all descendants regardless of namespace. I tried using //child::zuq:*
, but the SimpleXML
trees it creates don't seem to be complete.
Essentially, the objects captured should be all the top level objects of the zuq
namespace throughout the document, containing all descendant elements regardless of namespace, including zuq
.
tl;dr: How can I create a SimpleXML
object tree from a given document where each SimpleXML
root object is the highest level document element of a given namespace (such as zuq
) containing all descendants of said element regardless of the descendant namespace? XPath is not a requisite but appears to be the best choice based on my reading.
test.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:zuq="http://localhost/zuq">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>
<body>
<h1>Heading</h1>
<p>Paragraph</p>
<zuq:region name="myRegion">
<div class="myClass">
<h1><zuq:data name="myDataHeading" /></h1>
<p>
<zuq:data name="myDataParagraph">
<zuq:format type="trim">
<zuq:param name="length" value="200" />
<zuq:param name="append">
<span class="paragraphTrimOverflow">...</span>
</zuq:param>
</zuq:format>
</zuq:data>
</p>
</div>
</zuq:region>
</body>
</html>
$sxml = simplexml_load_file('test.html');
$sxml_zuq = $sxml->xpath('//zuq:*/descendant-or-self::node()');
print_r($sxml_zuq);
Produces:
Array
(
[0] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => myRegion
)
[div] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object //I don't know why these don't contain their zuq descendants
(
)
[p] => SimpleXMLElement Object
(
)
)
)
[1] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => myRegion
)
[div] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
)
[2] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
[3] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
[4] => SimpleXMLElement Object
(
)
[5] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => myDataHeading
)
)
[6] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
[7] => SimpleXMLElement Object
(
)
[8] => SimpleXMLElement Object
(
)
[9] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => myDataParagraph
)
)
[10] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => myDataParagraph
)
)
[11] => SimpleXMLElement Object
(
[@attributes] => Array
(
[type] => trim
)
)
[12] => SimpleXMLElement Object
(
[@attributes] => Array
(
[type] => trim
)
)
[13] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => length
[value] => 200
)
)
[14] => SimpleXMLElement Object
(
[@attributes] => Array
(
[type] => trim
)
)
[15] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => append
)
[span] => ...
)
[16] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => append
)
[span] => ...
)
[17] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => paragraphTrimOverflow
)
[0] => ...
)
[18] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => paragraphTrimOverflow
)
[0] => ...
)
[19] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => append
)
[span] => ...
)
[20] => SimpleXMLElement Object
(
[@attributes] => Array
(
[type] => trim
)
)
[21] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => myDataParagraph
)
)
[22] => SimpleXMLElement Object
(
)
[23] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
[24] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => myRegion
)
[div] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
)
)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
不要相信 print_r 语句的输出......它似乎显示一个空对象,但在我的测试中,孩子们实际上仍然在那里。例如,从上面的代码开始:
如果我随后尝试这样的命令:
我得到这个输出:
它似乎是空的,对吗?但是,如果我将命令修改为如下所示:
我得到带有命名空间子项的结果树:
我不是 100% 确定这是为什么;它可能与 print_r 语句试图展平 simplexml 对象并且没有正确处理名称空间有关。但是,当您保留从 xpath 调用返回的 simplexml 对象本身时,所有子对象都会被保留。
现在,关于您的 xpath 本身,您可能不想要“后代或自我”轴,因为这不仅会匹配顶级 zuq 元素,还会匹配其所有子元素并创建一个比您实际上是在寻求返回(除非我误解了您的要求)。如果您尝试这样的操作:
那么您将返回一个仅包含 zuq 命名空间元素顶层的数组。 (虽然您的示例 XML 只有一个这样的顶级元素,但您的实际数据可能在该级别有多个同级元素)。然后,您可以像这样捕获每个顶级元素的内容:
如果您想重复此过程,但在默认命名空间中搜索顶级(或任何)元素,事情会变得有点棘手;您必须使用 registerNamespace 函数为默认名称空间提供临时前缀,并对其进行 xpath 搜索。
Don't trust the output of the print_r statement ... it seems to be showing an empty object, but in my testing the children are actually still there. For example, starting with your code above:
If I subsequently try a command like this:
I get this output:
It seems to be empty, right? But if I modify the command to look like this:
I get the resultant tree with the namespaced child:
I'm not 100% sure why this is; it probably has something to do with the print_r statement trying to flatten the simplexml object and not dealing with the namespaces properly. But when you keep to the simplexml objects themselves that are returned from your xpath call, all of the children are preserved.
Now, in regards to your xpath itself, you probably DON'T want the "descendant-or-self" axis, because that will match not only the top-level zuq element, but also match all its children and create a larger array than you're actually seeking to return (unless I'm misunderstanding what you're asking). If you try something like this:
then you'll get back an array of ONLY the top level of zuq namespaced elements. (while your example XML only had one such top-level element, your actual data may have several siblings at that level). You can then capture the content of each of these top level elements like this:
Things get a little trickier if you want to repeat this process but do the search for top-level (or any) elements in the default namespace; you'd have to use the registerNamespace function to give the default namespace a temporary prefix, and do the xpath search on that.
我认为您正在寻找
//zuq:*/descendant-or-self::*
。这将导致所有子树的根具有zuq
命名空间前缀。观察到的行为似乎是 SimpleXML 的产物(XPath 规范不处理 XPath 查询输出中的树,只处理单独的节点)。您可能可以使用类似
//zuq:*[not(ancestor::zuq:*)]/descendant-or-self::*
Ancestor[...] 的方法来解决它条件为真的祖先 - 即是否存在带有 zuq 前缀的祖先。所以你应该只得到没有 zuq: 祖先的 zuq: 根。
I think you're looking for
//zuq:*/descendant-or-self::*
. This will result in all subtrees with the root havingzuq
namespace prefix.The observed behavior seems to be an artifact of SimpleXML (the XPath specification does not deal with trees in the XPath query output, only separate nodes). You can probably solve it using something like
//zuq:*[not(ancestor::zuq:*)]/descendant-or-self::*
ancestor[...] checks whether there is an ancestor for which a condition is true - i.e. whether there is an ancestor with zuq prefix. So you should get only zuq: roots that have no zuq: ancestor.