如何重写这个函数来实现 OrderedDict?

发布于 2024-10-01 01:08:44 字数 1475 浏览 1 评论 0原文

我有以下函数,它完成了将 XML 文件解析为字典的粗略工作。

不幸的是,由于 Python 字典没有排序,我无法按照我的意愿循环浏览节点。

如何更改它,以便它输出一个有序字典,该字典反映使用 for 循环时节点的原始顺序。

def simplexml_load_file(file):
    import collections
    from lxml import etree

    tree = etree.parse(file)
    root = tree.getroot()

    def xml_to_item(el):
        item = None
        if el.text:
            item = el.text
        child_dicts = collections.defaultdict(list)
        for child in el.getchildren():
            child_dicts[child.tag].append(xml_to_item(child))
        return dict(child_dicts) or item

    def xml_to_dict(el):
        return {el.tag: xml_to_item(el)}

    return xml_to_dict(root)

x = simplexml_load_file('routines/test.xml')

print x

for y in x['root']:
    print y

输出:

{'root': {
    'a': ['1'],
    'aa': [{'b': [{'c': ['2']}, '2']}],
    'aaaa': [{'bb': ['4']}],
    'aaa': ['3'],
    'aaaaa': ['5']
}}

a
aa
aaaa
aaa
aaaaa

如何实现collections.OrderedDict,以便确保获得正确的节点顺序?

供参考的 XML 文件:

<root>
    <a>1</a>
    <aa>
        <b>
            <c>2</c>
        </b>
        <b>2</b>
    </aa>
    <aaa>3</aaa>
    <aaaa>
        <bb>4</bb>
    </aaaa>
    <aaaaa>5</aaaaa>
</root>

I have the following function which does a crude job of parsing an XML file into a dictionary.

Unfortunately, since Python dictionaries are not ordered, I am unable to cycle through the nodes as I would like.

How do I change this so it outputs an ordered dictionary which reflects the original order of the nodes when looped with for.

def simplexml_load_file(file):
    import collections
    from lxml import etree

    tree = etree.parse(file)
    root = tree.getroot()

    def xml_to_item(el):
        item = None
        if el.text:
            item = el.text
        child_dicts = collections.defaultdict(list)
        for child in el.getchildren():
            child_dicts[child.tag].append(xml_to_item(child))
        return dict(child_dicts) or item

    def xml_to_dict(el):
        return {el.tag: xml_to_item(el)}

    return xml_to_dict(root)

x = simplexml_load_file('routines/test.xml')

print x

for y in x['root']:
    print y

Outputs:

{'root': {
    'a': ['1'],
    'aa': [{'b': [{'c': ['2']}, '2']}],
    'aaaa': [{'bb': ['4']}],
    'aaa': ['3'],
    'aaaaa': ['5']
}}

a
aa
aaaa
aaa
aaaaa

How can I implement collections.OrderedDict so that I can be sure of getting the correct order of the nodes?

XML file for reference:

<root>
    <a>1</a>
    <aa>
        <b>
            <c>2</c>
        </b>
        <b>2</b>
    </aa>
    <aaa>3</aaa>
    <aaaa>
        <bb>4</bb>
    </aaaa>
    <aaaaa>5</aaaaa>
</root>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

高冷爸爸 2024-10-08 01:08:44

您可以使用新的 OrderedDict< /a>dict 子类,在版本 2.7 中添加到标准库的 collections 模块中。实际上,您需要的是一个 Ordered+defaultdict 组合,该组合并不存在 - 但可以通过子类化 OrderedDict 来创建一个组合,如下所示:

如果您的 Python 版本没有 OrderedDict,您应该能够使用 Raymond Hettinger 的 Py2.4 的有序字典 ActiveState 配方作为基类。

import collections

class OrderedDefaultdict(collections.OrderedDict):
    """ A defaultdict with OrderedDict as its base class. """

    def __init__(self, default_factory=None, *args, **kwargs):
        if not (default_factory is None or callable(default_factory)):
            raise TypeError('first argument must be callable or None')
        super(OrderedDefaultdict, self).__init__(*args, **kwargs)
        self.default_factory = default_factory  # called by __missing__()

    def __missing__(self, key):
        if self.default_factory is None:
            raise KeyError(key,)
        self[key] = value = self.default_factory()
        return value

    def __reduce__(self):  # Optional, for pickle support.
        args = (self.default_factory,) if self.default_factory else tuple()
        return self.__class__, args, None, None, iter(self.items())

    def __repr__(self):  # Optional.
        return '%s(%r, %r)' % (self.__class__.__name__, self.default_factory, self.items())

def simplexml_load_file(file):
    from lxml import etree

    tree = etree.parse(file)
    root = tree.getroot()

    def xml_to_item(el):
        item = el.text or None
        child_dicts = OrderedDefaultdict(list)
        for child in el.getchildren():
            child_dicts[child.tag].append(xml_to_item(child))
        return collections.OrderedDict(child_dicts) or item

    def xml_to_dict(el):
        return {el.tag: xml_to_item(el)}

    return xml_to_dict(root)

x = simplexml_load_file('routines/test.xml')
print(x)

for y in x['root']:
    print(y)

测试 XML 文件生成的输出如下所示:

{'root':
    OrderedDict(
        [('a', ['1']),
         ('aa', [OrderedDict([('b', [OrderedDict([('c', ['2'])]), '2'])])]),
         ('aaa', ['3']),
         ('aaaa', [OrderedDict([('bb', ['4'])])]),
         ('aaaaa', ['5'])
        ]
    )
}

a
aa
aaa
aaaa
aaaaa

我认为这与您想要的很接近。

小更新:

添加了一个 __reduce__() 方法,该方法将允许正确对类的实例进行 pickle 和 unpickle。这对于这个问题来说不是必需的,但在类似问题中出现。

You could use the new OrderedDictdict subclass which was added to the standard library's collections module in version 2.7. Actually what you need is an Ordered+defaultdict combination which doesn't exist — but it's possible to create one by subclassing OrderedDict as illustrated below:

If your version of Python doesn't have OrderedDict, you should be able use Raymond Hettinger's Ordered Dictionary for Py2.4 ActiveState recipe as the base class instead.

import collections

class OrderedDefaultdict(collections.OrderedDict):
    """ A defaultdict with OrderedDict as its base class. """

    def __init__(self, default_factory=None, *args, **kwargs):
        if not (default_factory is None or callable(default_factory)):
            raise TypeError('first argument must be callable or None')
        super(OrderedDefaultdict, self).__init__(*args, **kwargs)
        self.default_factory = default_factory  # called by __missing__()

    def __missing__(self, key):
        if self.default_factory is None:
            raise KeyError(key,)
        self[key] = value = self.default_factory()
        return value

    def __reduce__(self):  # Optional, for pickle support.
        args = (self.default_factory,) if self.default_factory else tuple()
        return self.__class__, args, None, None, iter(self.items())

    def __repr__(self):  # Optional.
        return '%s(%r, %r)' % (self.__class__.__name__, self.default_factory, self.items())

def simplexml_load_file(file):
    from lxml import etree

    tree = etree.parse(file)
    root = tree.getroot()

    def xml_to_item(el):
        item = el.text or None
        child_dicts = OrderedDefaultdict(list)
        for child in el.getchildren():
            child_dicts[child.tag].append(xml_to_item(child))
        return collections.OrderedDict(child_dicts) or item

    def xml_to_dict(el):
        return {el.tag: xml_to_item(el)}

    return xml_to_dict(root)

x = simplexml_load_file('routines/test.xml')
print(x)

for y in x['root']:
    print(y)

The output produced from your test XML file looks like this:

{'root':
    OrderedDict(
        [('a', ['1']),
         ('aa', [OrderedDict([('b', [OrderedDict([('c', ['2'])]), '2'])])]),
         ('aaa', ['3']),
         ('aaaa', [OrderedDict([('bb', ['4'])])]),
         ('aaaaa', ['5'])
        ]
    )
}

a
aa
aaa
aaaa
aaaaa

Which I think is close to what you want.

Minor update:

Added a __reduce__() method which will allow the instances of the class to be pickled and unpickled properly. This wasn't necessary for this question, but came up in a similar one.

风启觞 2024-10-08 01:08:44

martineau 的食谱对我有用,但它与从 DefaultDict 继承的方法 copy() 存在问题。以下方法修复了此缺点:

class OrderedDefaultDict(OrderedDict):
    #Implementation as suggested by martineau

    def copy(self):
         return type(self)(self.default_factory, self)

请考虑,此实现不进行深度复制,这似乎特别适用于默认字典,而在大多数情况下是正确的做法

The recipe from martineau works for me, but it has problems with the method copy() inherited from DefaultDict. The following approach fix this drawback:

class OrderedDefaultDict(OrderedDict):
    #Implementation as suggested by martineau

    def copy(self):
         return type(self)(self.default_factory, self)

Please consider, that this implementation does no deepcopy, which seems especially for default dictionaries rather the right thing to do in most circumstances

青春如此纠结 2024-10-08 01:08:44

这里的答案列出了 OrderedDict 的许多可能的实现: 如何按照插入顺序从字典中检索项目?

您可以通过复制其中一个实现来创建自己的 OrderedDict 模块,以便在自己的代码中使用。我假设由于您运行的 Python 版本的原因,您无权访问 OrderedDict。

您的问题的一个有趣的方面是可能需要 defaultdict 功能。如果你需要这个,你可以实现__missing__方法来获得想要的效果。

There are many possible implementation of OrderedDict listed in the answer here: How do you retrieve items from a dictionary in the order that they're inserted?

You can create your own OrderedDict module for use in your own code by copying one of the implementations. I assume you do not have access to the OrderedDict because of the version of Python you are running.

One interesting aspect of your question is the possible need for defaultdict functionality. If you need this, you can implement the __missing__ method to get the desired effect.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文