将 XML 文件解析为 Python 对象
我有一个 XML 文件,如下所示:
<encspot>
<file>
<Name>some filename.mp3</Name>
<Encoder>Gogo (after 3.0)</Encoder>
<Bitrate>131</Bitrate>
<Mode>joint stereo</Mode>
<Length>00:02:43</Length>
<Size>5,236,644</Size>
<Frame>no</Frame>
<Quality>good</Quality>
<Freq.>44100</Freq.>
<Frames>6255</Frames>
..... and so forth ......
</file>
<file>....</file>
</encspot>
我想将其读入一个 python 对象,类似于字典列表。因为标记是绝对固定的,所以我很想使用正则表达式(我非常擅长使用它们)。然而,我想我会检查是否有人知道如何在这里轻松避免正则表达式。虽然我对 SAX 或其他解析没有太多经验,但我愿意学习。
我期待着看到如何在 Python 中不使用正则表达式快速完成此操作。感谢您的帮助!
I have an XML file which looks like this:
<encspot>
<file>
<Name>some filename.mp3</Name>
<Encoder>Gogo (after 3.0)</Encoder>
<Bitrate>131</Bitrate>
<Mode>joint stereo</Mode>
<Length>00:02:43</Length>
<Size>5,236,644</Size>
<Frame>no</Frame>
<Quality>good</Quality>
<Freq.>44100</Freq.>
<Frames>6255</Frames>
..... and so forth ......
</file>
<file>....</file>
</encspot>
I want to read it into a python object, something like a list of dictionaries. Because the markup is absolutely fixed, I'm tempted to use regex (I'm quite good at using those). However, I thought I'll check if someone knows how to easily avoid regexes here. I don't have much experience with SAX or other parsing, though, but I'm willing to learn.
I'm looking forward to be shown how this is done quickly without regexes in Python. Thanks for your help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果您认为正则表达式比这更容易,我心爱的 SD Chargers 帽子就适合您:
输出:
如果您对正则表达式的吸引力很简洁,那么这里有一个同样难以理解的用于创建数据结构的列表理解:
它创建了一个列表按文档顺序排列的
的 XML 子元素的元组:显然,只需多几行和多一点思考,您就可以使用 ElementTree 从 XML 创建您想要的任何数据结构。它是 Python 发行版的一部分。
编辑
代码高尔夫开始了!
如果您的 XML 只有
file
部分,您可以选择您的高尔夫球。如果您的 XML 有其他标签、其他部分,您需要考虑子级所在的部分,并且需要使用findall
有一个关于 ElementTree 的教程,位于 Effbot.org
My beloved SD Chargers hat is off to you if you think a regex is easier than this:
Output:
If your attraction to a regex is being terse, here is an equally incomprehensible bit of list comprehension to create a data structure:
Which creates a list of tuples of the XML children of
<file>
in document order:With a few more lines and a little more thought, obviously, you can create any data structure that you want from XML with ElementTree. It is part of the Python distribution.
Edit
Code golf is on!
If your XML only has the
file
section, you can choose your golf. If your XML has other tags, other sections, you need to account for the section the children are in and you will need to usefindall
There is a tutorial on ElementTree at Effbot.org
使用 ElementTree。您不需要/不想使用像 pyexpat 这样的仅解析小工具......您最终只会部分且糟糕地重新发明 ElementTree。
另一种可能性是 lxml,它是一个第三方包,它实现了 ElementTree 接口以及更多功能。
更新 有人开始玩代码高尔夫;这是我的条目,它实际上创建了您要求的数据结构:
您可能希望有一个将“属性”名称映射到转换函数的字典:
Use ElementTree. You don't need/want to muck about with a parse-only gadget like
pyexpat
... you'd only end up re-inventing ElementTree partially and poorly.Another possibility is lxml which is a third-party package which implements the ElementTree interface plus more.
Update Someone started playing code-golf; here's my entry, which actually creates the data structure you asked for:
You'd probably want to have a dict mapping "attribute" names to conversion functions:
我也一直在寻找一种在 XML 文档和 Python 数据结构之间转换数据的简单方法,类似于 Golang 的 XML 库,它允许您以声明方式指定如何从数据结构映射到 XML。
我无法找到这样一个 Python 库,因此我编写了一个用于声明性 XML 的库,名为 declxml 来满足我的需求加工。
使用 declxml,您可以创建以声明方式定义 XML 文档结构的处理器。处理器用于执行解析和序列化以及基本级别的验证。
使用 declxml 将此 XML 数据解析为字典列表非常简单
这会产生以下结果
想要将数据解析为对象而不是字典吗?你也可以这样做
这会产生输出
I have also been looking for a simple way to transform data between XML documents and Python data structures, something similar to Golang's XML library which allows you to declaratively specify how to map from data structures to XML.
I was unable to find such a library for Python, so I wrote one to meet my need called declxml for declarative XML processing.
With declxml, you create processors which declaratively define the structure of your XML document. Processors are used to perform both parsing and serialization as well as a basic level of validation.
Parsing this XML data into a list of dictionaries with declxml is straightforward
Which produces the following result
Want to parse the data into objects instead of dictionaries? You can do that as well
Which produces the output
此代码来自我老师的Github。它将 XML 字符串转换为 Python 对象。这种方法的优点是它适用于任何 XML。
实现逻辑:
定义数据:
从 XML 加载对象:
测试:
This code is from my teacher's Github. It converts XML string to Python object. The advantage of this approach is that it works on any XML.
Implement logic:
Define data:
Load object from XML:
Test:
如果你有一个将 XML 转换为对象的静态函数,它会是这样的
If you have a static function which converts a XML to Object it would be something like this