Python:libxml2 xpath 返回空列表

发布于 2024-11-04 19:51:07 字数 1619 浏览 3 评论 0原文

我想使用 xpath 使用 Python 的 libxml2 解析 XML 内容,我遵循 这个示例 和 < a href="http://www.zvon.org/xxl/XPathTutorial/General/examples.html" rel="nofollow">该教程。 XML 文件是:

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://purl.org/atom/ns#" version="0.3">
<title>Gmail - Inbox for [email protected]</title>
<tagline>New messages in your Gmail Inbox</tagline>
<fullcount>1</fullcount>
<link rel="alternate" href="http://mail.google.com/mail" type="text/html"/>
<modified>2011-05-04T18:56:19Z</modified>
</feed>

此 XML 存储在名为“atom”的文件中,我尝试以下操作:

>>> import libxml2
>>> myfile = open('/pathtomyfile/atom', 'r').read()
>>> xmldata = libxml2.parseDoc('myfile')
>>> data.xpathEval('/fullcount')
[]
>>>

现在,如您所见,它返回一个空列表。无论我向 xpath 提供什么,它都会返回一个空列表。但是,如果我使用 * 通配符,我会得到所有节点的列表:

>>>> data.xpathEval('//*')
[<xmlNode (feed) object at 0xb73862cc>, <xmlNode (title) object at 0xb738650c>, <xmlNode (tagline) object at 0xb73865ec>, <xmlNode (fullcount) object at 0xb738660c>, <xmlNode (link) object at 0xb738662c>, <xmlNode (modified) object at 0xb738664c>]

现在我不明白,从上面的工作示例来看,为什么 xpath 找不到“fullcount”节点或任何其他:我毕竟使用相同的语法...

有什么想法或建议吗?谢谢。

I want to parse XML content with Python's libxml2 using xpath, i followed this example and that tutorial. The XML file is:

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://purl.org/atom/ns#" version="0.3">
<title>Gmail - Inbox for [email protected]</title>
<tagline>New messages in your Gmail Inbox</tagline>
<fullcount>1</fullcount>
<link rel="alternate" href="http://mail.google.com/mail" type="text/html"/>
<modified>2011-05-04T18:56:19Z</modified>
</feed>

This XML is stored in a file called "atom", and i try the following:

>>> import libxml2
>>> myfile = open('/pathtomyfile/atom', 'r').read()
>>> xmldata = libxml2.parseDoc('myfile')
>>> data.xpathEval('/fullcount')
[]
>>>

Now as you can see it returns an empty list. No matter what i may provide xpath with, it will return an empty list. However, if i use the * wildcard, i get a list of all nodes:

>>>> data.xpathEval('//*')
[<xmlNode (feed) object at 0xb73862cc>, <xmlNode (title) object at 0xb738650c>, <xmlNode (tagline) object at 0xb73865ec>, <xmlNode (fullcount) object at 0xb738660c>, <xmlNode (link) object at 0xb738662c>, <xmlNode (modified) object at 0xb738664c>]

Now i don't understand, judging from the working examples above, why xpath doesn't find the "fullcount" node or any other: i'm using the same syntax after all...

Any idea or suggestion? Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

离旧人 2024-11-11 19:51:08

您的 XPath 失败,因为您需要在节点上指定 purl 命名空间

import libxml2
tree = libxml2.parseDoc(data)
xp = tree.xpathNewContext()
xp.xpathRegisterNs("purl", "http://purl.org/atom/ns#")
print xp.xpathEval('//purl:fullcount')

结果:(

[<xmlNode (fullcount) object at 0x7fbbeba9ef80>]

另外:查看 lxml,它有一个更好、更高级别的接口)。

Your XPath is failing because you need to specify the purl namespace on the node:

import libxml2
tree = libxml2.parseDoc(data)
xp = tree.xpathNewContext()
xp.xpathRegisterNs("purl", "http://purl.org/atom/ns#")
print xp.xpathEval('//purl:fullcount')

Result:

[<xmlNode (fullcount) object at 0x7fbbeba9ef80>]

(Also: check out lxml, it has a nicer, higher-level interface).

放赐 2024-11-11 19:51:08

首先:

/fullcount 是绝对路径,因此它会在文档的根目录中查找 元素,当该元素实际上位于 元素内。

其次:

您需要指定命名空间。这就是您使用 lxml 执行此操作的方法:

import lxml.etree as etree

tree = etree.parse('/pathtomyfile/atom')

fullcounts = tree.xpath('//ns:fullcount',
                namespaces={'ns': "http://purl.org/atom/ns#"})

print etree.tostring(fullcounts[0])

这将为您提供:

<fullcount xmlns="http://purl.org/atom/ns#">1</fullcount>

Firstly:

/fullcount is an absolute path, so it's looking for the <fullcount> element in the root of the document, when the element is in fact within the <feed> element.

Secondly:

You need to specify the namespace. This is how you would do it with lxml:

import lxml.etree as etree

tree = etree.parse('/pathtomyfile/atom')

fullcounts = tree.xpath('//ns:fullcount',
                namespaces={'ns': "http://purl.org/atom/ns#"})

print etree.tostring(fullcounts[0])

Which would give you:

<fullcount xmlns="http://purl.org/atom/ns#">1</fullcount>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文