我如何映射到字典而不是列表？

发布于 2024-09-30 21:39:08 字数 955 浏览 6 评论 0原文

我有以下函数，它执行将 lxml 对象映射到字典的基本工作...

from lxml import etree 

tree = etree.parse('file.xml')
root = tree.getroot()

def xml_to_dict(el):
    d={}
    if el.text:
        print '***write tag as string'
        d[el.tag] = el.text
    else:
        d[el.tag] = {}
    children = el.getchildren()
    if children:
        d[el.tag] = map(xml_to_dict, children)
    return d

    v = xml_to_dict(root)

目前它给了我....

>>>print v
{'root': [{'a': '1'}, {'a': [{'b': '2'}, {'b': '2'}]}, {'aa': '1a'}]}

但我想....

>>>print v
{'root': {'a': ['1', {'b': [2, 2]}], 'aa': '1a'}}

我如何重写函数 xml_to_dict(el)这样我就可以得到所需的输出？

为了清楚起见，这是我正在解析的 xml。

<root>
    <a>1</a>
    <a>
        <b>2</b>
        <b>2</b>
    </a>
    <aa>1a</aa>
</root>

谢谢：）

原文

i have the following function, which doe a basic job of mapping an lxml object to a dictionary...

from lxml import etree 

tree = etree.parse('file.xml')
root = tree.getroot()

def xml_to_dict(el):
    d={}
    if el.text:
        print '***write tag as string'
        d[el.tag] = el.text
    else:
        d[el.tag] = {}
    children = el.getchildren()
    if children:
        d[el.tag] = map(xml_to_dict, children)
    return d

    v = xml_to_dict(root)

at the moment it gives me....

>>>print v
{'root': [{'a': '1'}, {'a': [{'b': '2'}, {'b': '2'}]}, {'aa': '1a'}]}

but i would like....

>>>print v
{'root': {'a': ['1', {'b': [2, 2]}], 'aa': '1a'}}

how do i rewrite the function xml_to_dict(el) so that i get the required output?

here's the xml i'm parsing, for clarity.

<root>
    <a>1</a>
    <a>
        <b>2</b>
        <b>2</b>
    </a>
    <aa>1a</aa>
</root>

thanks :)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

静谧幽蓝 2024-10-07 21:39:09

嗯，map() 将始终返回一个列表，因此简单的答案是“不要使用map()”。相反，您可以像现在一样构建一个字典，方法是循环 children 并将 xml_to_dict(child) 的结果分配给您要使用的字典键。看起来您想使用标签作为键，并将值作为带有该标签的项目列表，因此它会变成这样

import collections
from lxml import etree

tree = etree.parse('file.xml')
root = tree.getroot()

def xml_to_dict(el):
    d={}
    if el.text:
        print '***write tag as string'
        d[el.tag] = el.text
    child_dicts = collections.defaultdict(list)
    for child in el.getchildren():
        child_dicts[child.tag].append(xml_to_dict(child))
    if child_dicts:
        d[el.tag] = child_dicts
    return d

xml_to_dict(root)

：如果您出于某种原因想要一个普通的字典，请使用 d[el.tag] = dict(child_dicts)。请注意，像以前一样，如果标签同时具有文本和子项，则文本将不会出现在字典中。您可能需要为您的字典考虑不同的布局来应对这种情况。

编辑：

在重新表述的问题中生成输出的代码不会在 xml_to_dict 中递归 - 因为您只需要外部元素的字典，而不是所有子标签的字典。所以，你会使用类似的东西：

import collections
from lxml import etree

tree = etree.parse('file.xml')
root = tree.getroot()

def xml_to_item(el):
    if el.text:
        print '***write tag as string'
        item = el.text
    child_dicts = collections.defaultdict(list)
    for child in el.getchildren():
        child_dicts[child.tag].append(xml_to_item(child))
    return dict(child_dicts) or item

def xml_to_dict(el):
    return {el.tag: xml_to_item(el)}

print xml_to_dict(root)

这仍然不能正常处理带有文本和子元素的标签，并且它将 collections.defaultdict(list) 变成一个普通的字典，所以输出是（几乎）如您所料：（

***write tag as string
***write tag as string
***write tag as string
***write tag as string
***write tag as string
***write tag as string
{'root': {'a': ['1', {'b': ['2', '2']}], 'aa': ['1a']}}

如果您确实想要整数而不是 b 标记中的文本数据的字符串，则必须以某种方式显式地将它们转换为整数。）

Well, map() will always return a list, so the easy answer is "don't use map()". Instead, build a dictionary like you already are, by looping over children and assigning the result of xml_to_dict(child) to the dictionary key you want to use. It looks like you want to use the tag as the key and have the value be a list of items with that tag, so it would become something like:

import collections
from lxml import etree

tree = etree.parse('file.xml')
root = tree.getroot()

def xml_to_dict(el):
    d={}
    if el.text:
        print '***write tag as string'
        d[el.tag] = el.text
    child_dicts = collections.defaultdict(list)
    for child in el.getchildren():
        child_dicts[child.tag].append(xml_to_dict(child))
    if child_dicts:
        d[el.tag] = child_dicts
    return d

xml_to_dict(root)

This leaves the tag entry in the dict as a defaultdict; if you want a normal dict for some reason, use d[el.tag] = dict(child_dicts). Note that, like before, if a tag has both text and children the text won't appear in the dict. You may want to think about a different layout for your dict to cope with that.

EDIT:

Code that would produce the output in your rephrased question wouldn't recurse in xml_to_dict -- because you only want a dict for the outer element, not for all child tags. So, you'd use something like:

import collections
from lxml import etree

tree = etree.parse('file.xml')
root = tree.getroot()

def xml_to_item(el):
    if el.text:
        print '***write tag as string'
        item = el.text
    child_dicts = collections.defaultdict(list)
    for child in el.getchildren():
        child_dicts[child.tag].append(xml_to_item(child))
    return dict(child_dicts) or item

def xml_to_dict(el):
    return {el.tag: xml_to_item(el)}

print xml_to_dict(root)

This still doesn't handle tags with both text and children sanely, and it turns the collections.defaultdict(list) into a normal dict so the output is (almost) as you expect:

***write tag as string
***write tag as string
***write tag as string
***write tag as string
***write tag as string
***write tag as string
{'root': {'a': ['1', {'b': ['2', '2']}], 'aa': ['1a']}}

(If you really want integers instead of strings for the text data in the b tags, you'll have to explicitly turn them into integers somehow.)

回复收藏 0 原文

兰花执着 2024-10-07 21:39:09

更简单：

from lxml import etree    
def recursive_dict(element):
    return element.tag, dict(map(recursive_dict, element)) or element.text

要使用它：

   >> tree = etree.parse(file_name)
   >> recursive_dict(tree.getroot())
   ('root', {'tag1': text, 'tag2': subtag21: {tag211: text}})

编辑：问题示例的输出：

('root', {'a': {'b': '2'}, 'aa': '1a'})

看起来 etree 会跳过重复的元素。

Simpler:

from lxml import etree    
def recursive_dict(element):
    return element.tag, dict(map(recursive_dict, element)) or element.text

To use it:

   >> tree = etree.parse(file_name)
   >> recursive_dict(tree.getroot())
   ('root', {'tag1': text, 'tag2': subtag21: {tag211: text}})

Edit: Output of question's example:

('root', {'a': {'b': '2'}, 'aa': '1a'})

It seems etree skips duplicate elements.

回复收藏 0 原文

~没有更多了~

关于作者

无可置疑

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

我如何映射到字典而不是列表？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

qq_Azmtwvzn

james0422

郭乐意

始于初秋

青春如此纠结

寂寞笑我太脆弱

友情链接

我如何映射到字典而不是列表？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

qq_Azmtwvzn

james0422

郭乐意

始于初秋

青春如此纠结

寂寞笑我太脆弱

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。