如何重写这个函数来实现 OrderedDict?
我有以下函数,它完成了将 XML 文件解析为字典的粗略工作。
不幸的是,由于 Python 字典没有排序,我无法按照我的意愿循环浏览节点。
如何更改它,以便它输出一个有序字典,该字典反映使用 for
循环时节点的原始顺序。
def simplexml_load_file(file):
import collections
from lxml import etree
tree = etree.parse(file)
root = tree.getroot()
def xml_to_item(el):
item = None
if el.text:
item = el.text
child_dicts = collections.defaultdict(list)
for child in el.getchildren():
child_dicts[child.tag].append(xml_to_item(child))
return dict(child_dicts) or item
def xml_to_dict(el):
return {el.tag: xml_to_item(el)}
return xml_to_dict(root)
x = simplexml_load_file('routines/test.xml')
print x
for y in x['root']:
print y
输出:
{'root': {
'a': ['1'],
'aa': [{'b': [{'c': ['2']}, '2']}],
'aaaa': [{'bb': ['4']}],
'aaa': ['3'],
'aaaaa': ['5']
}}
a
aa
aaaa
aaa
aaaaa
如何实现collections.OrderedDict
,以便确保获得正确的节点顺序?
供参考的 XML 文件:
<root>
<a>1</a>
<aa>
<b>
<c>2</c>
</b>
<b>2</b>
</aa>
<aaa>3</aaa>
<aaaa>
<bb>4</bb>
</aaaa>
<aaaaa>5</aaaaa>
</root>
I have the following function which does a crude job of parsing an XML file into a dictionary.
Unfortunately, since Python dictionaries are not ordered, I am unable to cycle through the nodes as I would like.
How do I change this so it outputs an ordered dictionary which reflects the original order of the nodes when looped with for
.
def simplexml_load_file(file):
import collections
from lxml import etree
tree = etree.parse(file)
root = tree.getroot()
def xml_to_item(el):
item = None
if el.text:
item = el.text
child_dicts = collections.defaultdict(list)
for child in el.getchildren():
child_dicts[child.tag].append(xml_to_item(child))
return dict(child_dicts) or item
def xml_to_dict(el):
return {el.tag: xml_to_item(el)}
return xml_to_dict(root)
x = simplexml_load_file('routines/test.xml')
print x
for y in x['root']:
print y
Outputs:
{'root': {
'a': ['1'],
'aa': [{'b': [{'c': ['2']}, '2']}],
'aaaa': [{'bb': ['4']}],
'aaa': ['3'],
'aaaaa': ['5']
}}
a
aa
aaaa
aaa
aaaaa
How can I implement collections.OrderedDict
so that I can be sure of getting the correct order of the nodes?
XML file for reference:
<root>
<a>1</a>
<aa>
<b>
<c>2</c>
</b>
<b>2</b>
</aa>
<aaa>3</aaa>
<aaaa>
<bb>4</bb>
</aaaa>
<aaaaa>5</aaaaa>
</root>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以使用新的
OrderedDict
< /a>dict
子类,在版本 2.7✶ 中添加到标准库的collections
模块中。实际上,您需要的是一个Ordered
+defaultdict
组合,该组合并不存在 - 但可以通过子类化OrderedDict
来创建一个组合,如下所示:✶ 如果您的 Python 版本没有
OrderedDict
,您应该能够使用 Raymond Hettinger 的 Py2.4 的有序字典 ActiveState 配方作为基类。测试 XML 文件生成的输出如下所示:
我认为这与您想要的很接近。
小更新:
添加了一个
__reduce__()
方法,该方法将允许正确对类的实例进行 pickle 和 unpickle。这对于这个问题来说不是必需的,但在类似问题中出现。You could use the new
OrderedDict
dict
subclass which was added to the standard library'scollections
module in version 2.7✶. Actually what you need is anOrdered
+defaultdict
combination which doesn't exist — but it's possible to create one by subclassingOrderedDict
as illustrated below:✶ If your version of Python doesn't have
OrderedDict
, you should be able use Raymond Hettinger's Ordered Dictionary for Py2.4 ActiveState recipe as the base class instead.The output produced from your test XML file looks like this:
Which I think is close to what you want.
Minor update:
Added a
__reduce__()
method which will allow the instances of the class to be pickled and unpickled properly. This wasn't necessary for this question, but came up in a similar one.martineau 的食谱对我有用,但它与从 DefaultDict 继承的方法 copy() 存在问题。以下方法修复了此缺点:
请考虑,此实现不进行深度复制,这似乎特别适用于默认字典,而在大多数情况下是正确的做法
The recipe from martineau works for me, but it has problems with the method copy() inherited from DefaultDict. The following approach fix this drawback:
Please consider, that this implementation does no deepcopy, which seems especially for default dictionaries rather the right thing to do in most circumstances
这里的答案列出了 OrderedDict 的许多可能的实现: 如何按照插入顺序从字典中检索项目?
您可以通过复制其中一个实现来创建自己的 OrderedDict 模块,以便在自己的代码中使用。我假设由于您运行的 Python 版本的原因,您无权访问 OrderedDict。
您的问题的一个有趣的方面是可能需要 defaultdict 功能。如果你需要这个,你可以实现
__missing__
方法来获得想要的效果。There are many possible implementation of OrderedDict listed in the answer here: How do you retrieve items from a dictionary in the order that they're inserted?
You can create your own OrderedDict module for use in your own code by copying one of the implementations. I assume you do not have access to the OrderedDict because of the version of Python you are running.
One interesting aspect of your question is the possible need for defaultdict functionality. If you need this, you can implement the
__missing__
method to get the desired effect.