如何从 XML 数据中获取特定元素？

发布于 2024-11-14 16:25:55 字数 2803 浏览 1 评论 0原文

我有一些代码来检索 XML 数据：

import cStringIO
import pycurl
from xml.etree import ElementTree

_API_KEY = 'my api key'
_ima = '/the/path/to/a/image'

sock = cStringIO.StringIO()

upl = pycurl.Curl()

values = [
            ("key", _API_KEY),
            ("image", (upl.FORM_FILE, _ima))]

upl.setopt(upl.URL, "http://api.imgur.com/2/upload.xml")
upl.setopt(upl.HTTPPOST, values)
upl.setopt(upl.WRITEFUNCTION, sock.write)
upl.perform()
upl.close()
xmldata = sock.getvalue()
#print xmldata
sock.close()

生成的数据如下所示：

<?xml version="1.0" encoding="utf-8"?>
<upload><image><name></name><title></title><caption></caption><hash>dxPGi</hash><deletehash>kj2XOt4DC13juUW</deletehash><datetime>2011-06-10 02:59:26</datetime><type>image/png</type><animated>false</animated><width>1024</width><height>768</height><size>172863</size><views>0</views><bandwidth>0</bandwidth></image><links><original>https://i.sstatic.net/dxPGi.png</original><imgur_page>http://imgur.com/dxPGi</imgur_page><delete_page>http://imgur.com/delete/kj2XOt4DC13juUW</delete_page><small_square>https://i.sstatic.net/dxPGis.jpg</small_square><large_thumbnail>https://i.sstatic.net/dxPGil.jpg</large_thumbnail></links></upload>

现在，遵循此答案，我试图从数据中获取一些特定值。

这是我的尝试：

tree = ElementTree.fromstring(xmldata)
url = tree.findtext('original')
webpage = tree.findtext('imgur_page')
delpage = tree.findtext('delete_page')

print 'Url: ' + str(url)
print 'Pagina: ' + str(webpage)
print 'Link de borrado: ' + str(delpage)

如果我尝试添加 .text 访问权限，我会收到 AttributeError：

Traceback (most recent call last):
  File "<pyshell#28>", line 27, in <module>
    url = tree.find('original').text
AttributeError: 'NoneType' object has no attribute 'text'

我在 Python 的 ElementTree 帮助中找不到任何内容> 关于此属性。如何只获取文本而不获取对象？

我找到了一些有关获取文本字符串这里；但是当我尝试时，我得到一个 TypeError:

Traceback (most recent call last): 
  File "<pyshell#32>", line 34, in <module>
    print 'Url: ' + url
TypeError: cannot concatenate 'str' and 'NoneType' objects

如果我尝试打印 'Url: ' + str(url) ，则没有错误，但结果显示为 None。

如何从此 XML 中获取 url、webpageanddelete_page` 数据？

原文

I have some code to retrieve XML data:

import cStringIO
import pycurl
from xml.etree import ElementTree

_API_KEY = 'my api key'
_ima = '/the/path/to/a/image'

sock = cStringIO.StringIO()

upl = pycurl.Curl()

values = [
            ("key", _API_KEY),
            ("image", (upl.FORM_FILE, _ima))]

upl.setopt(upl.URL, "http://api.imgur.com/2/upload.xml")
upl.setopt(upl.HTTPPOST, values)
upl.setopt(upl.WRITEFUNCTION, sock.write)
upl.perform()
upl.close()
xmldata = sock.getvalue()
#print xmldata
sock.close()

The resulting data looks like:

<?xml version="1.0" encoding="utf-8"?>
<upload><image><name></name><title></title><caption></caption><hash>dxPGi</hash><deletehash>kj2XOt4DC13juUW</deletehash><datetime>2011-06-10 02:59:26</datetime><type>image/png</type><animated>false</animated><width>1024</width><height>768</height><size>172863</size><views>0</views><bandwidth>0</bandwidth></image><links><original>https://i.sstatic.net/dxPGi.png</original><imgur_page>http://imgur.com/dxPGi</imgur_page><delete_page>http://imgur.com/delete/kj2XOt4DC13juUW</delete_page><small_square>https://i.sstatic.net/dxPGis.jpg</small_square><large_thumbnail>https://i.sstatic.net/dxPGil.jpg</large_thumbnail></links></upload>

Now, following this answer, I'm trying to get some specific values from the data.

This is my attempt:

tree = ElementTree.fromstring(xmldata)
url = tree.findtext('original')
webpage = tree.findtext('imgur_page')
delpage = tree.findtext('delete_page')

print 'Url: ' + str(url)
print 'Pagina: ' + str(webpage)
print 'Link de borrado: ' + str(delpage)

I get an AttributeError if I try to add the .text access:

Traceback (most recent call last):
  File "<pyshell#28>", line 27, in <module>
    url = tree.find('original').text
AttributeError: 'NoneType' object has no attribute 'text'

I couldn't find anything in Python's help for ElementTree about this attribute. How can I get only the text, not the object?

I found some info about getting a text string here; but when I try it I get a TypeError:

Traceback (most recent call last): 
  File "<pyshell#32>", line 34, in <module>
    print 'Url: ' + url
TypeError: cannot concatenate 'str' and 'NoneType' objects

If I try to print 'Url: ' + str(url) instead, there is no error, but the result shows as None.

How can I get the url, webpageanddelete_page` data from this XML?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

萌逼全场 2024-11-21 16:25:55

您的 find() 调用尝试查找树顶部带有名为 original 的标记的直接子级，而不是比该标记级别更低的标记。使用：

url = tree.find('.//original').text

如果您想查找树中带有名为 original 标签的所有元素。 ElementTree 的 find() 方法的模式匹配规则在此页面的表格中列出：http://effbot.org/zone/element-xpath.htm

对于 // 匹配它说：

选择当前元素下所有级别上的所有子元素（搜索整个子树）。例如，“.//egg”选择整个树中的所有“egg”元素。

编辑：这里有一些测试代码供您使用，它使用您发布的 XML 示例字符串我刚刚在 TextMate 中通过 XML Tidy 运行它以使其清晰：

from xml.etree import ElementTree
xmldata = '''<?xml version="1.0" encoding="utf-8"?>
<upload>
    <image>
        <name/>
        <title/>
        <caption/>
        <hash>dxPGi</hash>
        <deletehash>kj2XOt4DC13juUW</deletehash>
        <datetime>2011-06-10 02:59:26</datetime>
        <type>image/png</type>
        <animated>false</animated>
        <width>1024</width>
        <height>768</height>
        <size>172863</size>
        <views>0</views>
        <bandwidth>0</bandwidth>
</image>
<links>
    <original>https://i.sstatic.net/dxPGi.png</original>
    <imgur_page>http://imgur.com/dxPGi</imgur_page>
    <delete_page>http://imgur.com/delete/kj2XOt4DC13juUW</delete_page>
    <small_square>https://i.sstatic.net/dxPGis.jpg</small_square>
    <large_thumbnail>https://i.sstatic.net/dxPGil.jpg</large_thumbnail>
</links>
</upload>'''
tree = ElementTree.fromstring(xmldata)
print tree.find('.//original').text

在我的机器上（运行 python 2.6.1 的 OS X）生成：

Ian-Cs-MacBook-Pro:tmp ian$ python test.py 
https://i.sstatic.net/dxPGi.png

Your find() call is trying to find an immediate child of the top of the tree with a tag named original, not a tag at any lower level than that. Use:

url = tree.find('.//original').text

if you want to find all elements in the tree with the tag named original. The pattern matching rules for ElementTree's find() method are laid out in a table on this page: http://effbot.org/zone/element-xpath.htm

For // matching it says:

Selects all subelements, on all levels beneath the current element (search the entire subtree). For example, “.//egg” selects all “egg” elements in the entire tree.

Edit: here is some test code for you, it use the XML sample string you posted I just ran it through XML Tidy in TextMate to make it legible:

from xml.etree import ElementTree
xmldata = '''<?xml version="1.0" encoding="utf-8"?>
<upload>
    <image>
        <name/>
        <title/>
        <caption/>
        <hash>dxPGi</hash>
        <deletehash>kj2XOt4DC13juUW</deletehash>
        <datetime>2011-06-10 02:59:26</datetime>
        <type>image/png</type>
        <animated>false</animated>
        <width>1024</width>
        <height>768</height>
        <size>172863</size>
        <views>0</views>
        <bandwidth>0</bandwidth>
</image>
<links>
    <original>https://i.sstatic.net/dxPGi.png</original>
    <imgur_page>http://imgur.com/dxPGi</imgur_page>
    <delete_page>http://imgur.com/delete/kj2XOt4DC13juUW</delete_page>
    <small_square>https://i.sstatic.net/dxPGis.jpg</small_square>
    <large_thumbnail>https://i.sstatic.net/dxPGil.jpg</large_thumbnail>
</links>
</upload>'''
tree = ElementTree.fromstring(xmldata)
print tree.find('.//original').text

On my machine (OS X running python 2.6.1) that produces:

Ian-Cs-MacBook-Pro:tmp ian$ python test.py 
https://i.sstatic.net/dxPGi.png

回复收藏 0 原文

~没有更多了~

关于作者

合久必婚

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

如何从 XML 数据中获取特定元素？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

lixs

敷衍

盗梦空间

tian

13375331123

你对谁都笑

友情链接

如何从 XML 数据中获取特定元素？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

lixs

敷衍

盗梦空间

tian

13375331123

你对谁都笑

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

敷衍