使用默认命名空间绑定对 XML 进行 PHP xpath 查询
我对这个主题问题有一个解决方案,但这是一个黑客,我想知道是否有更好的方法来做到这一点。
下面是一个示例 XML 文件和一个 PHP CLI 脚本,该脚本执行作为参数给出的 xpath 查询。对于这个测试用例,命令行是:
./xpeg "//MainType[@ID=123]"
最奇怪的是这一行,没有它我的方法就不起作用:
$result->loadXML($result->saveXML($result));
据我所知,这只是重新解析修改后的 XML,在我看来这应该没有必要。
有没有更好的方法在 PHP 中对此 XML 执行 xpath 查询?
XML(注意默认命名空间的绑定):
<?xml version="1.0" encoding="utf-8"?>
<MyRoot
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.example.com/data http://www.example.com/data/MyRoot.xsd"
xmlns="http://www.example.com/data">
<MainType ID="192" comment="Bob's site">
<Price>$0.20</Price>
<TheUrl><![CDATA[http://www.example.com/path1/]]></TheUrl>
<Validated>N</Validated>
</MainType>
<MainType ID="123" comment="Test site">
<Price>$99.95</Price>
<TheUrl><![CDATA[http://www.example.com/path2]]></TheUrl>
<Validated>N</Validated>
</MainType>
<MainType ID="922" comment="Health Insurance">
<Price>$600.00</Price>
<TheUrl><![CDATA[http://www.example.com/eg/xyz.php]]></TheUrl>
<Validated>N</Validated>
</MainType>
<MainType ID="389" comment="Used Cars">
<Price>$5000.00</Price>
<TheUrl><![CDATA[http://www.example.com/tata.php]]></TheUrl>
<Validated>N</Validated>
</MainType>
</MyRoot>
PHP CLI 脚本:
#!/usr/bin/php-cli
<?php
$xml = file_get_contents("xpeg.xml");
$domdoc = new DOMDocument();
$domdoc->loadXML($xml);
// remove the default namespace binding
$e = $domdoc->documentElement;
$e->removeAttributeNS($e->getAttributeNode("xmlns")->nodeValue,"");
// hack hack, cough cough, hack hack
$domdoc->loadXML($domdoc->saveXML($domdoc));
$xpath = new DOMXpath($domdoc);
$str = trim($argv[1]);
$result = $xpath->query($str);
if ($result !== FALSE) {
dump_dom_levels($result);
}
else {
echo "error\n";
}
// The following function isn't really part of the
// question. It simply provides a concise summary of
// the result.
function dump_dom_levels($node, $level = 0) {
$class = get_class($node);
if ($class == "DOMNodeList") {
echo "Level $level ($class): $node->length items\n";
foreach ($node as $child_node) {
dump_dom_levels($child_node, $level+1);
}
}
else {
$nChildren = 0;
foreach ($node->childNodes as $child_node) {
if ($child_node->hasChildNodes()) {
$nChildren++;
}
}
if ($nChildren) {
echo "Level $level ($class): $nChildren children\n";
}
foreach ($node->childNodes as $child_node) {
if ($child_node->hasChildNodes()) {
dump_dom_levels($child_node, $level+1);
}
}
}
}
?>
I have one solution to the subject problem, but it’s a hack and I’m wondering if there’s a better way to do this.
Below is a sample XML file and a PHP CLI script that executes an xpath query given as an argument. For this test case, the command line is:
./xpeg "//MainType[@ID=123]"
What seems most strange is this line, without which my approach doesn’t work:
$result->loadXML($result->saveXML($result));
As far as I know, this simply re-parses the modified XML, and it seems to me that this shouldn’t be necessary.
Is there a better way to perform xpath queries on this XML in PHP?
XML (note the binding of the default namespace):
<?xml version="1.0" encoding="utf-8"?>
<MyRoot
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.example.com/data http://www.example.com/data/MyRoot.xsd"
xmlns="http://www.example.com/data">
<MainType ID="192" comment="Bob's site">
<Price>$0.20</Price>
<TheUrl><![CDATA[http://www.example.com/path1/]]></TheUrl>
<Validated>N</Validated>
</MainType>
<MainType ID="123" comment="Test site">
<Price>$99.95</Price>
<TheUrl><![CDATA[http://www.example.com/path2]]></TheUrl>
<Validated>N</Validated>
</MainType>
<MainType ID="922" comment="Health Insurance">
<Price>$600.00</Price>
<TheUrl><![CDATA[http://www.example.com/eg/xyz.php]]></TheUrl>
<Validated>N</Validated>
</MainType>
<MainType ID="389" comment="Used Cars">
<Price>$5000.00</Price>
<TheUrl><![CDATA[http://www.example.com/tata.php]]></TheUrl>
<Validated>N</Validated>
</MainType>
</MyRoot>
PHP CLI Script:
#!/usr/bin/php-cli
<?php
$xml = file_get_contents("xpeg.xml");
$domdoc = new DOMDocument();
$domdoc->loadXML($xml);
// remove the default namespace binding
$e = $domdoc->documentElement;
$e->removeAttributeNS($e->getAttributeNode("xmlns")->nodeValue,"");
// hack hack, cough cough, hack hack
$domdoc->loadXML($domdoc->saveXML($domdoc));
$xpath = new DOMXpath($domdoc);
$str = trim($argv[1]);
$result = $xpath->query($str);
if ($result !== FALSE) {
dump_dom_levels($result);
}
else {
echo "error\n";
}
// The following function isn't really part of the
// question. It simply provides a concise summary of
// the result.
function dump_dom_levels($node, $level = 0) {
$class = get_class($node);
if ($class == "DOMNodeList") {
echo "Level $level ($class): $node->length items\n";
foreach ($node as $child_node) {
dump_dom_levels($child_node, $level+1);
}
}
else {
$nChildren = 0;
foreach ($node->childNodes as $child_node) {
if ($child_node->hasChildNodes()) {
$nChildren++;
}
}
if ($nChildren) {
echo "Level $level ($class): $nChildren children\n";
}
foreach ($node->childNodes as $child_node) {
if ($child_node->hasChildNodes()) {
dump_dom_levels($child_node, $level+1);
}
}
}
}
?>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
解决方案是使用命名空间,而不是摆脱它。
并在命令行上这样调用它(注意 XPath 表达式中的
x:
)。来使其更加闪亮。
$xpath->query()
之前注册所有命名空间,xyz=http//namespace.uri/ 形式的参数> 创建自定义命名空间前缀
底线是:当您真正指的是
//namespace:foo
时,XPath 无法查询//foo
。这些根本不同,因此选择不同的节点。 XML 可以定义默认名称空间(因此可以删除文档中显式的名称空间使用)这一事实并不意味着您可以删除 XPath 中的名称空间使用。The solution is using the namespace, not getting rid of it.
And call it as this on the command line (note the
x:
in the XPath expression)You can make this more shiny by
$xpath->query()
xyz=http//namespace.uri/
to create custom namespace prefixesBottom line is: In XPath you can't query
//foo
when you really mean//namespace:foo
. These are fundamentally different and therefore select different nodes. The fact that XML can have a default namespace defined (and thus can drop explicit namespace usage in the document) does not mean you can drop namespace usage in XPath.只是出于好奇,如果删除这条线会发生什么?
在我看来,这最有可能导致您需要进行黑客攻击。您基本上是删除
xmlns="http://www.example.com/data"
部分,然后重新构建 DOMDocument。您是否考虑过简单地使用字符串函数来删除该名称空间?然后继续你的路吗?它甚至可能会变得更快。
Just out of curiosity, what happens if you remove this line?
That strikes me as the most likely to cause the need for your hack. You're basically removing the
xmlns="http://www.example.com/data"
part and then re-building the DOMDocument. Have you considered simply using string functions to remove that namespace?Then continue on your way? It might even end up being faster.
考虑到 XPath 语言的当前状态,我认为 Tomalek 提供了最好的答案:将前缀与默认名称空间关联起来,并为所有标记名称添加前缀。这就是我打算在当前应用程序中使用的解决方案。
当这不可能或不可行时,比我的黑客更好的解决方案是调用与重新扫描执行相同操作的方法(希望更有效): DOMDocument::normalizeDocument()。该方法的行为“就像您保存然后加载文档一样,将文档置于‘正常’形式。”
Given the current state of the XPath language, I feel that the best answer is provided by Tomalek: to associate a prefix with the default namespace and to prefix all tag names. That’s the solution I intend to use in my current application.
When that’s not possible or practical, a better solution than my hack is to invoke a method that does the same thing as re-scanning (hopefully more efficiently): DOMDocument::normalizeDocument(). The method behaves “as if you saved and then loaded the document, putting the document in a ‘normal’ form.”
另外,作为变体,您可以使用 xpath 掩码:
Also as a variant you may use a xpath mask: