如何从 XML 数据中获取特定元素?

发布于 2024-11-14 16:25:55 字数 2803 浏览 1 评论 0原文

我有一些代码来检索 XML 数据:

import cStringIO
import pycurl
from xml.etree import ElementTree

_API_KEY = 'my api key'
_ima = '/the/path/to/a/image'

sock = cStringIO.StringIO()

upl = pycurl.Curl()

values = [
            ("key", _API_KEY),
            ("image", (upl.FORM_FILE, _ima))]

upl.setopt(upl.URL, "http://api.imgur.com/2/upload.xml")
upl.setopt(upl.HTTPPOST, values)
upl.setopt(upl.WRITEFUNCTION, sock.write)
upl.perform()
upl.close()
xmldata = sock.getvalue()
#print xmldata
sock.close()

生成的数据如下所示:

<?xml version="1.0" encoding="utf-8"?>
<upload><image><name></name><title></title><caption></caption><hash>dxPGi</hash><deletehash>kj2XOt4DC13juUW</deletehash><datetime>2011-06-10 02:59:26</datetime><type>image/png</type><animated>false</animated><width>1024</width><height>768</height><size>172863</size><views>0</views><bandwidth>0</bandwidth></image><links><original>https://i.sstatic.net/dxPGi.png</original><imgur_page>http://imgur.com/dxPGi</imgur_page><delete_page>http://imgur.com/delete/kj2XOt4DC13juUW</delete_page><small_square>https://i.sstatic.net/dxPGis.jpg</small_square><large_thumbnail>https://i.sstatic.net/dxPGil.jpg</large_thumbnail></links></upload>

现在,遵循 此答案 ,我试图从数据中获取一些特定值。

这是我的尝试:

tree = ElementTree.fromstring(xmldata)
url = tree.findtext('original')
webpage = tree.findtext('imgur_page')
delpage = tree.findtext('delete_page')

print 'Url: ' + str(url)
print 'Pagina: ' + str(webpage)
print 'Link de borrado: ' + str(delpage)

如果我尝试添加 .text 访问权限,我会收到 AttributeError

Traceback (most recent call last):
  File "<pyshell#28>", line 27, in <module>
    url = tree.find('original').text
AttributeError: 'NoneType' object has no attribute 'text'

我在 Python 的 ElementTree 帮助中找不到任何内容> 关于此属性。如何只获取文本而不获取对象?

我找到了一些有关获取文本字符串 这里;但是当我尝试时,我得到一个 TypeError:

Traceback (most recent call last): 
  File "<pyshell#32>", line 34, in <module>
    print 'Url: ' + url
TypeError: cannot concatenate 'str' and 'NoneType' objects

如果我尝试打印 'Url: ' + str(url) ,则没有错误,但结果显示为 None

如何从此 XML 中获取 url、webpageanddelete_page` 数据?

I have some code to retrieve XML data:

import cStringIO
import pycurl
from xml.etree import ElementTree

_API_KEY = 'my api key'
_ima = '/the/path/to/a/image'

sock = cStringIO.StringIO()

upl = pycurl.Curl()

values = [
            ("key", _API_KEY),
            ("image", (upl.FORM_FILE, _ima))]

upl.setopt(upl.URL, "http://api.imgur.com/2/upload.xml")
upl.setopt(upl.HTTPPOST, values)
upl.setopt(upl.WRITEFUNCTION, sock.write)
upl.perform()
upl.close()
xmldata = sock.getvalue()
#print xmldata
sock.close()

The resulting data looks like:

<?xml version="1.0" encoding="utf-8"?>
<upload><image><name></name><title></title><caption></caption><hash>dxPGi</hash><deletehash>kj2XOt4DC13juUW</deletehash><datetime>2011-06-10 02:59:26</datetime><type>image/png</type><animated>false</animated><width>1024</width><height>768</height><size>172863</size><views>0</views><bandwidth>0</bandwidth></image><links><original>https://i.sstatic.net/dxPGi.png</original><imgur_page>http://imgur.com/dxPGi</imgur_page><delete_page>http://imgur.com/delete/kj2XOt4DC13juUW</delete_page><small_square>https://i.sstatic.net/dxPGis.jpg</small_square><large_thumbnail>https://i.sstatic.net/dxPGil.jpg</large_thumbnail></links></upload>

Now, following this answer, I'm trying to get some specific values from the data.

This is my attempt:

tree = ElementTree.fromstring(xmldata)
url = tree.findtext('original')
webpage = tree.findtext('imgur_page')
delpage = tree.findtext('delete_page')

print 'Url: ' + str(url)
print 'Pagina: ' + str(webpage)
print 'Link de borrado: ' + str(delpage)

I get an AttributeError if I try to add the .text access:

Traceback (most recent call last):
  File "<pyshell#28>", line 27, in <module>
    url = tree.find('original').text
AttributeError: 'NoneType' object has no attribute 'text'

I couldn't find anything in Python's help for ElementTree about this attribute. How can I get only the text, not the object?

I found some info about getting a text string here; but when I try it I get a TypeError:

Traceback (most recent call last): 
  File "<pyshell#32>", line 34, in <module>
    print 'Url: ' + url
TypeError: cannot concatenate 'str' and 'NoneType' objects

If I try to print 'Url: ' + str(url) instead, there is no error, but the result shows as None.

How can I get the url, webpageanddelete_page` data from this XML?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

萌逼全场 2024-11-21 16:25:55

您的 find() 调用尝试查找树顶部带有名为 original 的标记的直接子级,而不是比该标记级别更低的标记。使用:

url = tree.find('.//original').text

如果您想查找树中带有名为 original 标签的所有元素。 ElementTree 的 find() 方法的模式匹配规则在此页面的表格中列出:http://effbot.org/zone/element-xpath.htm

对于 // 匹配它说:

选择当前元素下所有级别上的所有子元素(搜索整个子树)。例如,“.//egg”选择整个树中的所有“egg”元素。

编辑:这里有一些测试代码供您使用,它使用您发布的 XML 示例字符串我刚刚在 TextMate 中通过 XML Tidy 运行它以使其清晰:

from xml.etree import ElementTree
xmldata = '''<?xml version="1.0" encoding="utf-8"?>
<upload>
    <image>
        <name/>
        <title/>
        <caption/>
        <hash>dxPGi</hash>
        <deletehash>kj2XOt4DC13juUW</deletehash>
        <datetime>2011-06-10 02:59:26</datetime>
        <type>image/png</type>
        <animated>false</animated>
        <width>1024</width>
        <height>768</height>
        <size>172863</size>
        <views>0</views>
        <bandwidth>0</bandwidth>
</image>
<links>
    <original>https://i.sstatic.net/dxPGi.png</original>
    <imgur_page>http://imgur.com/dxPGi</imgur_page>
    <delete_page>http://imgur.com/delete/kj2XOt4DC13juUW</delete_page>
    <small_square>https://i.sstatic.net/dxPGis.jpg</small_square>
    <large_thumbnail>https://i.sstatic.net/dxPGil.jpg</large_thumbnail>
</links>
</upload>'''
tree = ElementTree.fromstring(xmldata)
print tree.find('.//original').text

在我的机器上(运行 python 2.6.1 的 OS X)生成:

Ian-Cs-MacBook-Pro:tmp ian$ python test.py 
https://i.sstatic.net/dxPGi.png

Your find() call is trying to find an immediate child of the top of the tree with a tag named original, not a tag at any lower level than that. Use:

url = tree.find('.//original').text

if you want to find all elements in the tree with the tag named original. The pattern matching rules for ElementTree's find() method are laid out in a table on this page: http://effbot.org/zone/element-xpath.htm

For // matching it says:

Selects all subelements, on all levels beneath the current element (search the entire subtree). For example, “.//egg” selects all “egg” elements in the entire tree.

Edit: here is some test code for you, it use the XML sample string you posted I just ran it through XML Tidy in TextMate to make it legible:

from xml.etree import ElementTree
xmldata = '''<?xml version="1.0" encoding="utf-8"?>
<upload>
    <image>
        <name/>
        <title/>
        <caption/>
        <hash>dxPGi</hash>
        <deletehash>kj2XOt4DC13juUW</deletehash>
        <datetime>2011-06-10 02:59:26</datetime>
        <type>image/png</type>
        <animated>false</animated>
        <width>1024</width>
        <height>768</height>
        <size>172863</size>
        <views>0</views>
        <bandwidth>0</bandwidth>
</image>
<links>
    <original>https://i.sstatic.net/dxPGi.png</original>
    <imgur_page>http://imgur.com/dxPGi</imgur_page>
    <delete_page>http://imgur.com/delete/kj2XOt4DC13juUW</delete_page>
    <small_square>https://i.sstatic.net/dxPGis.jpg</small_square>
    <large_thumbnail>https://i.sstatic.net/dxPGil.jpg</large_thumbnail>
</links>
</upload>'''
tree = ElementTree.fromstring(xmldata)
print tree.find('.//original').text

On my machine (OS X running python 2.6.1) that produces:

Ian-Cs-MacBook-Pro:tmp ian$ python test.py 
https://i.sstatic.net/dxPGi.png
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文