如何解析 XML 并获取特定节点属性的实例?
我在 XML 中有很多行,并且正在尝试获取特定节点属性的实例。
<foo>
<bar>
<type foobar="1"/>
<type foobar="2"/>
</bar>
</foo>
如何访问属性 foobar
的值?在此示例中,我需要 "1"
和 "2"
。
I have many rows in XML and I'm trying to get instances of a particular node attribute.
<foo>
<bar>
<type foobar="1"/>
<type foobar="2"/>
</bar>
</foo>
How do I access the values of the attribute foobar
? In this example, I want "1"
and "2"
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(21)
我建议
ElementTree
。同一 API 还有其他兼容的实现,例如lxml
和 < Python 标准库本身中的 code>cElementTree ;但是,在这种情况下,他们主要添加的是更快的速度——编程的难易程度取决于ElementTree
定义的 API。首先从 XML 构建一个 Element 实例
root
,例如使用 XML 函数,或者通过使用以下内容解析文件:或者
ElementTree
。然后执行类似的操作:输出:
I suggest
ElementTree
. There are other compatible implementations of the same API, such aslxml
, andcElementTree
in the Python standard library itself; but, in this context, what they chiefly add is even more speed -- the ease of programming part depends on the API, whichElementTree
defines.First build an Element instance
root
from the XML, e.g. with the XML function, or by parsing a file with something like:Or any of the many other ways shown at
ElementTree
. Then do something like:Output:
minidom
是最快且漂亮的直接向前。XML:
Python:
输出:
minidom
is the quickest and pretty straight forward.XML:
Python:
Output:
您可以使用 BeautifulSoup:
You can use BeautifulSoup:
有很多选择。如果速度和内存使用是一个问题,cElementTree 看起来非常好。与简单地使用 readlines 读取文件相比,它的开销非常小。
相关指标可以在下表中找到,复制自 cElementTree 网站:
正如所指出的作者:@jfs、
cElementTree
与 Python 捆绑在一起:from xml.etree import cElementTree as ElementTree
。from xml.etree import ElementTree
(自动使用加速的 C 版本)。There are many options out there. cElementTree looks excellent if speed and memory usage are an issue. It has very little overhead compared to simply reading in the file using
readlines
.The relevant metrics can be found in the table below, copied from the cElementTree website:
As pointed out by @jfs,
cElementTree
comes bundled with Python:from xml.etree import cElementTree as ElementTree
.from xml.etree import ElementTree
(the accelerated C version is used automatically).为了简单起见,我建议 xmltodict 。
它将您的 XML 解析为 OrderedDict;
I suggest xmltodict for simplicity.
It parses your XML to an OrderedDict;
lxml.objectify 非常简单。
获取示例文本:
输出:
lxml.objectify is really simple.
Taking your sample text:
Output:
Python 有一个与 expat XML 解析器的接口。
它是一个非验证解析器,因此错误的 XML 不会被捕获。但是,如果您知道您的文件是正确的,那么这非常好,您可能会获得所需的确切信息,并且可以即时丢弃其余信息。
Python has an interface to the expat XML parser.
It's a non-validating parser, so bad XML will not be caught. But if you know your file is correct, then this is pretty good, and you'll probably get the exact info you want and you can discard the rest on the fly.
为了添加另一种可能性,您可以使用 untangle,因为它是一个简单的 xml 到 python 对象库。这里有一个示例:
安装:
用法:
您的 XML 文件(稍作更改):
使用
untangle
访问属性:输出将为:
有关 untangle 的更多信息可以在“解开"。
另外,如果您好奇,可以在“Python 和 XML”中找到用于处理 XML 和 Python 的工具列表”。您还会看到之前的答案提到了最常见的问题。
Just to add another possibility, you can use untangle, as it is a simple xml-to-python-object library. Here you have an example:
Installation:
Usage:
Your XML file (a little bit changed):
Accessing the attributes with
untangle
:The output will be:
More information about untangle can be found in "untangle".
Also, if you are curious, you can find a list of tools for working with XML and Python in "Python and XML". You will also see that the most common ones were mentioned by previous answers.
我可能会建议 declxml。
全面披露:我编写这个库是因为我正在寻找一种在 XML 和 Python 数据结构之间进行转换的方法,而无需使用 ElementTree 编写数十行命令式解析/序列化代码。
通过 declxml,您可以使用处理器以声明方式定义 XML 文档的结构以及如何在 XML 和 Python 数据结构之间进行映射。处理器用于序列化和解析以及基本级别的验证。
解析为 Python 数据结构非常简单:
生成输出:
您还可以使用相同的处理器将数据序列化为 XML
生成以下输出
如果您想使用对象而不是字典,您可以定义处理器将数据转换为 和也来自物体。
产生以下输出
I might suggest declxml.
Full disclosure: I wrote this library because I was looking for a way to convert between XML and Python data structures without needing to write dozens of lines of imperative parsing/serialization code with ElementTree.
With declxml, you use processors to declaratively define the structure of your XML document and how to map between XML and Python data structures. Processors are used to for both serialization and parsing as well as for a basic level of validation.
Parsing into Python data structures is straightforward:
Which produces the output:
You can also use the same processor to serialize data to XML
Which produces the following output
If you want to work with objects instead of dictionaries, you can define processors to transform data to and from objects as well.
Which produces the following output
这是使用
cElementTree
的非常简单但有效的代码。这是来自“python xml 解析”。
Here a very simple but effective code using
cElementTree
.This is from "python xml parse".
XML:
Python 代码:
输出:
XML:
Python code:
Output:
xml.etree.ElementTree 与 lxml
这是两个最常用的库的一些优点,在进行选择之前了解一下会很有帮助。
xml.etree.ElementTree:
lxml
standalone="没有”?
.node
一样使用 XML。sourceline
允许轻松获取您正在使用的 XML 元素的行。xml.etree.ElementTree vs. lxml
These are some pros of the two most used libraries I would have benefit to know before choosing between them.
xml.etree.ElementTree:
lxml
standalone="no"
?.node
.sourceline
allows to easily get the line of the XML element you are using.如果您使用
python-benedict
,则无需使用特定于库的 API。只需从 XML 初始化一个新实例并轻松管理它,因为它是dict
子类。安装很简单:
pip install python-benedict
它支持多种格式的 I/O 操作并规范化:
Base64
、CSV、
JSON
、TOML
、XML
、YAML
和查询字符串
。它经过充分测试,并且在 GitHub 上开源。披露:我是作者。
There's no need to use a lib specific API if you use
python-benedict
. Just initialize a new instance from your XML and manage it easily since it is adict
subclass.Installation is easy:
pip install python-benedict
It supports and normalizes I/O operations with many formats:
Base64
,CSV
,JSON
,TOML
,XML
,YAML
andquery-string
.It is well tested and open-source on GitHub. Disclosure: I am the author.
这将打印 foobar 属性的值。
This will print the value of the
foobar
attribute.simplified_scrapy
:一个新的lib,使用后我就爱上了它。我推荐给你。这里是更多示例。这个库很容易使用。
simplified_scrapy
: a new lib, I fell in love with it after I used it. I recommend it to you.Here are more examples. This lib is easy to use.
我很受伤,没有人推荐熊猫。 Pandas 有一个函数
read_xml()
,它非常适合这种平面 xml 结构。输出:
I am wounder, that no one suggest pandas. Pandas have a function
read_xml()
, what is perfect for such flat xml structures.Output:
如果您不想使用任何外部库或第三方工具,请尝试以下代码。
xml
解析为 pythondictionary
和标签仅包含Code
Sample input
Output (美化)
If you don't want to use any external libraries or 3rd party tools, Please try below code.
xml
into pythondictionary
<tag/>
and tags with only attributes like<tag var=val/>
Code
Sample input
Output (Beautified)
如果源是 xml 文件,就像这个示例一样,
您可以尝试以下代码
输出将是
If the source is an xml file, say like this sample
you may try the following code
Output would be
使用 iterparse() 你可以捕获标签属性字典值:
With iterparse() you can catch the tag attribute dictionary value:
经典的 SAX 解析器 解决方案可以像:
输出:
The classic SAX parser solution could work like:
Output: