Python：libxml2 xpath 返回空列表

发布于 2024-11-04 19:51:07 字数 1619 浏览 3 评论 0原文

我想使用 xpath 使用 Python 的 libxml2 解析 XML 内容，我遵循这个示例和 < a href="http://www.zvon.org/xxl/XPathTutorial/General/examples.html" rel="nofollow">该教程。 XML 文件是：

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://purl.org/atom/ns#" version="0.3">
<title>Gmail - Inbox for [email protected]</title>
<tagline>New messages in your Gmail Inbox</tagline>
<fullcount>1</fullcount>
<link rel="alternate" href="http://mail.google.com/mail" type="text/html"/>
<modified>2011-05-04T18:56:19Z</modified>
</feed>

此 XML 存储在名为“atom”的文件中，我尝试以下操作：

>>> import libxml2
>>> myfile = open('/pathtomyfile/atom', 'r').read()
>>> xmldata = libxml2.parseDoc('myfile')
>>> data.xpathEval('/fullcount')
[]
>>>

现在，如您所见，它返回一个空列表。无论我向 xpath 提供什么，它都会返回一个空列表。但是，如果我使用 * 通配符，我会得到所有节点的列表：

>>>> data.xpathEval('//*')
[<xmlNode (feed) object at 0xb73862cc>, <xmlNode (title) object at 0xb738650c>, <xmlNode (tagline) object at 0xb73865ec>, <xmlNode (fullcount) object at 0xb738660c>, <xmlNode (link) object at 0xb738662c>, <xmlNode (modified) object at 0xb738664c>]

现在我不明白，从上面的工作示例来看，为什么 xpath 找不到“fullcount”节点或任何其他：我毕竟使用相同的语法...

有什么想法或建议吗？谢谢。

原文

I want to parse XML content with Python's libxml2 using xpath, i followed this example and that tutorial. The XML file is:

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://purl.org/atom/ns#" version="0.3">
<title>Gmail - Inbox for [email protected]</title>
<tagline>New messages in your Gmail Inbox</tagline>
<fullcount>1</fullcount>
<link rel="alternate" href="http://mail.google.com/mail" type="text/html"/>
<modified>2011-05-04T18:56:19Z</modified>
</feed>

This XML is stored in a file called "atom", and i try the following:

>>> import libxml2
>>> myfile = open('/pathtomyfile/atom', 'r').read()
>>> xmldata = libxml2.parseDoc('myfile')
>>> data.xpathEval('/fullcount')
[]
>>>

Now as you can see it returns an empty list. No matter what i may provide xpath with, it will return an empty list. However, if i use the * wildcard, i get a list of all nodes:

>>>> data.xpathEval('//*')
[<xmlNode (feed) object at 0xb73862cc>, <xmlNode (title) object at 0xb738650c>, <xmlNode (tagline) object at 0xb73865ec>, <xmlNode (fullcount) object at 0xb738660c>, <xmlNode (link) object at 0xb738662c>, <xmlNode (modified) object at 0xb738664c>]

Now i don't understand, judging from the working examples above, why xpath doesn't find the "fullcount" node or any other: i'm using the same syntax after all...

Any idea or suggestion? Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

离旧人 2024-11-11 19:51:08

您的 XPath 失败，因为您需要在节点上指定 purl 命名空间：

import libxml2
tree = libxml2.parseDoc(data)
xp = tree.xpathNewContext()
xp.xpathRegisterNs("purl", "http://purl.org/atom/ns#")
print xp.xpathEval('//purl:fullcount')

结果：（

[<xmlNode (fullcount) object at 0x7fbbeba9ef80>]

另外：查看 lxml，它有一个更好、更高级别的接口）。

Your XPath is failing because you need to specify the purl namespace on the node:

import libxml2
tree = libxml2.parseDoc(data)
xp = tree.xpathNewContext()
xp.xpathRegisterNs("purl", "http://purl.org/atom/ns#")
print xp.xpathEval('//purl:fullcount')

Result:

[<xmlNode (fullcount) object at 0x7fbbeba9ef80>]

(Also: check out lxml, it has a nicer, higher-level interface).

回复收藏 0 原文

放赐 2024-11-11 19:51:08

首先：

/fullcount 是绝对路径，因此它会在文档的根目录中查找元素，当该元素实际上位于元素内。

其次：

您需要指定命名空间。这就是您使用 lxml 执行此操作的方法：

import lxml.etree as etree

tree = etree.parse('/pathtomyfile/atom')

fullcounts = tree.xpath('//ns:fullcount',
                namespaces={'ns': "http://purl.org/atom/ns#"})

print etree.tostring(fullcounts[0])

这将为您提供：

<fullcount xmlns="http://purl.org/atom/ns#">1</fullcount>

Firstly:

/fullcount is an absolute path, so it's looking for the <fullcount> element in the root of the document, when the element is in fact within the <feed> element.

Secondly:

You need to specify the namespace. This is how you would do it with lxml:

import lxml.etree as etree

tree = etree.parse('/pathtomyfile/atom')

fullcounts = tree.xpath('//ns:fullcount',
                namespaces={'ns': "http://purl.org/atom/ns#"})

print etree.tostring(fullcounts[0])

Which would give you:

<fullcount xmlns="http://purl.org/atom/ns#">1</fullcount>

回复收藏 0 原文

~没有更多了~

关于作者

别理我

暂无简介

0 文章

0 评论

25 人气

关注发私信

友情链接

文江博客

Python：libxml2 xpath 返回空列表

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

Python：libxml2 xpath 返回空列表

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。