从 xml 创建 python 对象表示的模块
我正在寻找一个易于处理的 python 本机模块来从 xml 创建 python 对象表示。
我通过谷歌找到了几个模块(其中之一是 XMLObject),但不想全部尝试一下。
您认为做这些事情的最好方法是什么?
编辑:我没有提到我想要读取的 XML 不是由我生成的。 它是一个现有的 XML 文件,其结构我无法控制。
I'm searching for an easy to handle python native module to create python object representation from xml.
I found several modules via google (one of them is XMLObject) but didn't want to try out all of them.
What do you think is the best way to do such things?
EDIT: I missed to mention that the XML I'd like to read is not generated by me. It's an existing XML file in a structure of which I have no control over.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
您说您想要一个对象表示,我将其解释为意味着节点成为对象,并且节点的属性和子节点表示为对象的属性(可能根据某些架构)。 我相信这就是 XMLObject 所做的事情。
有一些我知道的软件包。 4Suite 包含一些执行此操作的工具,我相信 Amara 专门实现了这一点(构建在 4Suite 之上)。 您还可以使用 lxml.objectify,它的灵感来自 Amara 和 gnosis.xml.objectify。
当然,第三种选择是,给定 XML 的具体表示(使用 ElementTree 或 lxml),您可以围绕它构建自己的自定义模型。 lxml.html 就是一个示例,它使用一些特定于 HTML 的功能扩展了 lxml 的基本接口。
You say you want an object representation, which I would interpret to mean that nodes become objects, and the attributes and children of the node are represented as attributes of the object (possibly according to some Schema). This is what XMLObject does, I believe.
There are some packages that I know of. 4Suite includes some tools to do this, and I believe Amara specifically implements this (built on top of 4Suite). You can also use lxml.objectify, which was inspired by Amara and gnosis.xml.objectify.
Of course a third option is, given a concrete representation of the XML (using ElementTree or lxml) you can build your own custom model around that. lxml.html is an example of that, extending the base interface of lxml with some HTML-specific functionality.
我赞同 xml.etree.ElementTree 的建议,主要是因为它现在位于 stdlib 中。
还有一个更快的实现,即 xml.etree.cElementTree 也可用。
如果您确实需要性能,我建议 lxml
http://www.ibm .com/developerworks//xml/library/x-hiperfparse/
I second the suggestion of xml.etree.ElementTree, mostly because it's now in the stdlib.
There is also a faster implementation, xml.etree.cElementTree available too.
If you really need performance, I would suggest lxml
http://www.ibm.com/developerworks//xml/library/x-hiperfparse/
我听说最简单的是 ElementTree,尽管我很少使用XML,我无法根据经验说什么。
I've heard the easiest is ElementTree, though I rarely work with XML and I can't say anything from experience.
还有适用于 Python 的优秀第 3 方库 pyxser。
There's also the excellent 3rd party library pyxser for Python.
我使用(并且喜欢)PyRXP,它创建一个从 XML 文档构建的元组。
直接 XML 的主要问题 -> python 对象结构的特点是,没有 python 类似的属性列表 - 也就是说,包含元素的列表也恰好具有属性。 如果您愿意,它同时是一个列表和一本字典。
我解析 PyRXP 的结果,并根据结构创建列表/字典 - 我正在处理的 XML 要么是基于列表的,要么是基于属性的,而不是两者兼而有之。 (我正在使用来自已知来源的数据)。
I use (and like) PyRXP, which creates a tuple built from the XML document.
The main issue with a straight XML -> python object structure is that there is no python analog for a attributed list - that is, a list with elements, that also happens to have attributes. If you like, it is both a list and a dictionary at the same time.
I parse the result from PyRXP, and create the list/dictionary depending upon the structure - the XML I am dealing with is either list or attribute-based, never both. (I am consuming data from a known source).
Python 具有用于 Python 对象序列化的 pickle 和 cPickle 模块。 这两个模块都提供了序列化/反序列化 Python 对象层次结构以转换为字节流或从字节流转换为字节流的功能:
下面提供了类似的接口:pickle()、unpickle(),用于序列化到/从 XML
Python has pickle and cPickle modules for Python object serialization. Both of these modules provide functionality to serialize/deserialize Python object hierarchy to convert to/from a byte stream:
The following provides similar interface: pickle(), unpickle() for serialization to/from XML
我对 Wai Yip Tung 的 xml2obj 函数非常好运,这里提供:
http://code.activestate.com/recipes/534109-xml-to-python-data-struct/
大约有 84 行代码。 它是原生的、纯Python的; 使用 xml.sax 和 re(正则表达式)库。 您只需将 XML 传递给它并返回您的对象。
I've had pretty good luck with Wai Yip Tung's xml2obj function available here:
http://code.activestate.com/recipes/534109-xml-to-python-data-structure/
It's ~84 lines of code. It's native and pure python; using xml.sax and re (regular expression) libraries. You just pass it XML and get back your object.