我如何映射到字典而不是列表?
我有以下函数,它执行将 lxml 对象映射到字典的基本工作...
from lxml import etree
tree = etree.parse('file.xml')
root = tree.getroot()
def xml_to_dict(el):
d={}
if el.text:
print '***write tag as string'
d[el.tag] = el.text
else:
d[el.tag] = {}
children = el.getchildren()
if children:
d[el.tag] = map(xml_to_dict, children)
return d
v = xml_to_dict(root)
目前它给了我....
>>>print v
{'root': [{'a': '1'}, {'a': [{'b': '2'}, {'b': '2'}]}, {'aa': '1a'}]}
但我想....
>>>print v
{'root': {'a': ['1', {'b': [2, 2]}], 'aa': '1a'}}
我如何重写函数 xml_to_dict(el)这样我就可以得到所需的输出?
为了清楚起见,这是我正在解析的 xml。
<root>
<a>1</a>
<a>
<b>2</b>
<b>2</b>
</a>
<aa>1a</aa>
</root>
谢谢 :)
i have the following function, which doe a basic job of mapping an lxml object to a dictionary...
from lxml import etree
tree = etree.parse('file.xml')
root = tree.getroot()
def xml_to_dict(el):
d={}
if el.text:
print '***write tag as string'
d[el.tag] = el.text
else:
d[el.tag] = {}
children = el.getchildren()
if children:
d[el.tag] = map(xml_to_dict, children)
return d
v = xml_to_dict(root)
at the moment it gives me....
>>>print v
{'root': [{'a': '1'}, {'a': [{'b': '2'}, {'b': '2'}]}, {'aa': '1a'}]}
but i would like....
>>>print v
{'root': {'a': ['1', {'b': [2, 2]}], 'aa': '1a'}}
how do i rewrite the function xml_to_dict(el) so that i get the required output?
here's the xml i'm parsing, for clarity.
<root>
<a>1</a>
<a>
<b>2</b>
<b>2</b>
</a>
<aa>1a</aa>
</root>
thanks :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
嗯,
map()
将始终返回一个列表,因此简单的答案是“不要使用map()
”。相反,您可以像现在一样构建一个字典,方法是循环children
并将xml_to_dict(child)
的结果分配给您要使用的字典键。看起来您想使用标签作为键,并将值作为带有该标签的项目列表,因此它会变成这样:如果您出于某种原因想要一个普通的字典,请使用 d[el.tag] = dict(child_dicts)。请注意,像以前一样,如果标签同时具有文本和子项,则文本将不会出现在字典中。您可能需要为您的字典考虑不同的布局来应对这种情况。
编辑:
在重新表述的问题中生成输出的代码不会在
xml_to_dict
中递归 - 因为您只需要外部元素的字典,而不是所有子标签的字典。所以,你会使用类似的东西:这仍然不能正常处理带有文本和子元素的标签,并且它将
collections.defaultdict(list)
变成一个普通的字典,所以输出是(几乎)如您所料:(如果您确实想要整数而不是
b
标记中的文本数据的字符串,则必须以某种方式显式地将它们转换为整数。)Well,
map()
will always return a list, so the easy answer is "don't usemap()
". Instead, build a dictionary like you already are, by looping overchildren
and assigning the result ofxml_to_dict(child)
to the dictionary key you want to use. It looks like you want to use the tag as the key and have the value be a list of items with that tag, so it would become something like:This leaves the tag entry in the dict as a defaultdict; if you want a normal dict for some reason, use
d[el.tag] = dict(child_dicts)
. Note that, like before, if a tag has both text and children the text won't appear in the dict. You may want to think about a different layout for your dict to cope with that.EDIT:
Code that would produce the output in your rephrased question wouldn't recurse in
xml_to_dict
-- because you only want a dict for the outer element, not for all child tags. So, you'd use something like:This still doesn't handle tags with both text and children sanely, and it turns the
collections.defaultdict(list)
into a normal dict so the output is (almost) as you expect:(If you really want integers instead of strings for the text data in the
b
tags, you'll have to explicitly turn them into integers somehow.)更简单:
要使用它:
编辑:问题示例的输出:
看起来 etree 会跳过重复的元素。
Simpler:
To use it:
Edit: Output of question's example:
It seems etree skips duplicate elements.