使用 minidom 修改时保留属性的顺序

发布于 2024-07-15 03:32:03 字数 325 浏览 10 评论 0原文

在使用 minidom 处理 XML 时,有没有办法可以保留属性的原始顺序?

假设我有: 当我用 minidom 修改它时,属性将按字母顺序重新排列为蓝色、绿色和红色。 我想保留原来的顺序。

我通过循环访问 elements = doc.getElementsByTagName('color') 返回的元素来处理文件,然后进行像这样的分配 e.attributes["red"].value = “233”。

Is there a way I can preserve the original order of attributes when processing XML with minidom?

Say I have: <color red="255" green="255" blue="233" />
when I modify this with minidom the attributes are rearranged alphabetically blue, green, and red. I'd like to preserve the original order.

I am processing the file by looping through the elements returned by elements = doc.getElementsByTagName('color') and then I do assignments like this e.attributes["red"].value = "233".

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

白昼 2024-07-22 03:32:03

为了保持属性顺序,我在 minidom 中做了这个轻微的修改:

from collections import OrderedDict

在 Element 类中:

__init__(...)
    self._attrs = OrderedDict()
    #self._attrs = {}
writexml(...)
    #a_names.sort()

现在这只适用于 Python 2.7+
而且我不确定它是否真的有效=> 使用风险自负...

并且请注意,您不应依赖属性顺序:

请注意,开始标记或空元素标记中属性规范的顺序并不重要。

To keep the attribute order I made this slight modification in minidom:

from collections import OrderedDict

In the Element class :

__init__(...)
    self._attrs = OrderedDict()
    #self._attrs = {}
writexml(...)
    #a_names.sort()

Now this will only work with Python 2.7+
And I'm not sure if it actually works => Use at your own risks...

And please note that you should not rely on attribute order:

Note that the order of attribute specifications in a start-tag or empty-element tag is not significant.

眼睛会笑 2024-07-22 03:32:03

在使用 minidom 处理 XML 时,有没有办法保留属性的原始顺序?

对于 minidom no,用于存储属性的数据类型是无序字典。 pxdom 可以做到这一点,尽管速度要慢得多。

Is there a way I can preserve the original order of attributes when processing XML with minidom?

With minidom no, the datatype used to store attributes is an unordered dictionary. pxdom can do it, though it is considerably slower.

回眸一笑 2024-07-22 03:32:03

很明显,xml 属性没有排序。
我刚刚发现这种奇怪的行为!

看来这与 xml.dom.minidom.Element.writexml 函数中添加的排序有关!

class Element(Node):
... snip ...

    def writexml(self, writer, indent="", addindent="", newl=""):
        # indent = current indentation
        # addindent = indentation to add to higher levels
        # newl = newline string
        writer.write(indent+"<" + self.tagName)

        attrs = self._get_attributes()
        a_names = attrs.keys()
        a_names.sort()
--------^^^^^^^^^^^^^^
        for a_name in a_names:
            writer.write(" %s=\"" % a_name)
            _write_data(writer, attrs[a_name].value)
            writer.write("\"")

删除该行可恢复保持原始文档顺序的行为。
当您必须使用 diff 工具检查代码中是否有错误时,这是​​一个好主意。

It is clear that xml attribute are not ordered.
I just have found this strange behavior !

It seems that this related to a sort added in xml.dom.minidom.Element.writexml function !!

class Element(Node):
... snip ...

    def writexml(self, writer, indent="", addindent="", newl=""):
        # indent = current indentation
        # addindent = indentation to add to higher levels
        # newl = newline string
        writer.write(indent+"<" + self.tagName)

        attrs = self._get_attributes()
        a_names = attrs.keys()
        a_names.sort()
--------^^^^^^^^^^^^^^
        for a_name in a_names:
            writer.write(" %s=\"" % a_name)
            _write_data(writer, attrs[a_name].value)
            writer.write("\"")

Removing the line restore a behavior which keep the order of the original document.
It is a good idea when you have to check with diff tools that there is not a mistake in your code.

傲娇萝莉攻 2024-07-22 03:32:03

在 Python 2.7 之前,我使用了以下热补丁

class _MinidomHooker(object):
    def __enter__(self):
        minidom.NamedNodeMap.keys_orig = minidom.NamedNodeMap.keys
        minidom.NamedNodeMap.keys = self._NamedNodeMap_keys_hook
        return self

    def __exit__(self, *args):
        minidom.NamedNodeMap.keys = minidom.NamedNodeMap.keys_orig
        del minidom.NamedNodeMap.keys_orig

    @staticmethod
    def _NamedNodeMap_keys_hook(node_map):
        class OrderPreservingList(list):
            def sort(self):
                pass
        return OrderPreservingList(node_map.keys_orig())

使用这种方式:

with _MinidomHooker():
    document.writexml(...)

免责声明:

  1. 您不应依赖属性的顺序。
  2. 改变 NamedNodeMap 类不是线程安全的。
  3. 热补丁是邪恶的。

Before Python 2.7, I used following hotpatching:

class _MinidomHooker(object):
    def __enter__(self):
        minidom.NamedNodeMap.keys_orig = minidom.NamedNodeMap.keys
        minidom.NamedNodeMap.keys = self._NamedNodeMap_keys_hook
        return self

    def __exit__(self, *args):
        minidom.NamedNodeMap.keys = minidom.NamedNodeMap.keys_orig
        del minidom.NamedNodeMap.keys_orig

    @staticmethod
    def _NamedNodeMap_keys_hook(node_map):
        class OrderPreservingList(list):
            def sort(self):
                pass
        return OrderPreservingList(node_map.keys_orig())

Used this way:

with _MinidomHooker():
    document.writexml(...)

Disclaimer:

  1. thou shall not rely on the order of attributes.
  2. mutating the NamedNodeMap class is not thread safe.
  3. hotpatching is evil.
べ映画 2024-07-22 03:32:03

你们可以提出任意数量的免责声明。
虽然重新排序属性对程序没有意义,但对程序员/用户却有意义。

对于 Fredrick 来说,RGB 顺序很重要,因为这就是颜色的顺序。
对我来说,尤其是名称属性。

比较

<field name="url" type="string" indexed="true" stored="true" required="true" multiValued="false"/> <!-- ID -->
<field name="forkortelse" type="string" indexed="true" stored="true" required="false" multiValued="false" />
<field name="kortform" type="text_general" indexed="true" stored="true" required="false" multiValued="false" />
<field name="dato" type="date" indexed="true" stored="true" required="false" multiValued="false" />
<field name="nummer" type="int" indexed="true" stored="true" required="false" multiValued="false" />
<field name="kilde" type="string" indexed="true" stored="true" required="false" multiValued="false" />
<field name="tittel" type="text_general" indexed="true" stored="true" multiValued="true"/>

虽然

<field indexed="true" multiValued="false" name="forkortelse" required="false" stored="true" type="string"/>
<field indexed="true" multiValued="false" name="kortform" required="false" stored="true" type="text_general"/>
<field indexed="true" multiValued="false" name="dato" required="false" stored="true" type="date"/>
<field indexed="true" multiValued="false" name="nummer" required="false" stored="true" type="int"/>
<field indexed="true" multiValued="false" name="kilde" required="false" stored="true" type="string"/>
<field an_optional_attr="OMG!" an_optional_attr2="OMG!!" indexed="true" name="tittel" stored="true" type="text_general"/>

并非不可能阅读,但并不那么容易。 名称是重要的属性。 隐藏名称字段是不好的。 如果名称左侧有 15 个属性,其中前面的 7 个属性是可选的,该怎么办?

关键是,重新排序是一个比升序所带来的回报更大的问题。 它扰乱了程序员的思维方式或功能应该如何工作。 至少排序应该是可配置/可选的。

请原谅我糟糕的英语。 这不是我的主要语言。

You guys can put up as many disclaimers you want.
While reordering the attributes has no meaning for the program it does have a meaning for the programmer/user.

For Fredrick it was important to have the RGB order since that is how the order of the colors is.
For me it is the name attribute in particular.

Compare

<field name="url" type="string" indexed="true" stored="true" required="true" multiValued="false"/> <!-- ID -->
<field name="forkortelse" type="string" indexed="true" stored="true" required="false" multiValued="false" />
<field name="kortform" type="text_general" indexed="true" stored="true" required="false" multiValued="false" />
<field name="dato" type="date" indexed="true" stored="true" required="false" multiValued="false" />
<field name="nummer" type="int" indexed="true" stored="true" required="false" multiValued="false" />
<field name="kilde" type="string" indexed="true" stored="true" required="false" multiValued="false" />
<field name="tittel" type="text_general" indexed="true" stored="true" multiValued="true"/>

Against

<field indexed="true" multiValued="false" name="forkortelse" required="false" stored="true" type="string"/>
<field indexed="true" multiValued="false" name="kortform" required="false" stored="true" type="text_general"/>
<field indexed="true" multiValued="false" name="dato" required="false" stored="true" type="date"/>
<field indexed="true" multiValued="false" name="nummer" required="false" stored="true" type="int"/>
<field indexed="true" multiValued="false" name="kilde" required="false" stored="true" type="string"/>
<field an_optional_attr="OMG!" an_optional_attr2="OMG!!" indexed="true" name="tittel" stored="true" type="text_general"/>

While it is not impossible to read it is not as easy. The name is the important attribute. Hiding the name field way back is no good. What if the name was 15 attributes to the left where 7 of the attributes in front was optional?

The point is that the reordering is a bigger problem than what the acsending ordering gives in return. It messes with the way the programmer thinks or how the functionality is supposed to work. At least the ordering should be configurable/optional.

Excuse my poor english. It is not my main language.

末が日狂欢 2024-07-22 03:32:03

1.自定义您自己的“Element.writexml”方法。

从“minidom.py”将Element的writexml代码复制到您自己的文件中。

将其重命名为 writexml_nosort,

删除 'a_names.sort()' (python 2.7)
或将 'a_names = Sorted(attrs.keys())' 更改为 'a_names = attrs.keys()'(python 3.4)

将 Element 的方法更改为您自己的方法:

minidom.Element.writexml = writexml_nosort;

2.自定义您最喜欢的顺序:

right_order = ['a', 'b', 'c', 'a1', 'b1']

3.调整您元素的_attrs

node._attrs = OrderedDict( [(k,node._attrs[k]) for k in right_order ] )

1.Custom your own 'Element.writexml' method.

from 'minidom.py' copy Element's writexml code to your own file.

rename it to writexml_nosort,

delete 'a_names.sort()' (python 2.7)
or change 'a_names = sorted(attrs.keys())' to 'a_names = attrs.keys()'(python 3.4)

change the Element's method to your own:

minidom.Element.writexml = writexml_nosort;

2.custom your favorite order:

right_order = ['a', 'b', 'c', 'a1', 'b1']

3.adjust your element 's _attrs

node._attrs = OrderedDict( [(k,node._attrs[k]) for k in right_order ] )

夏见 2024-07-22 03:32:03

使用 Element 类中的 writexlm 函数写入时,属性按 minidom 排序。
它是这样完成的:

a-name = sorted(attrs.keys())

您可以将其更改为

a-name = list(attrs.keys())

对于空闲我必须更改文件
/usr/lib/python3.6/xml/dom。 看来 Idle 不遵循 sys.path 顺序。
不要忘记先进行备份。

The attributes are ordered in minidom while writing with writexlm function in class Element.
It is done like this:

a-name = sorted(attrs.keys())

You can change this to

a-name = list(attrs.keys())

For Idle I had to change the file in
/usr/lib/python3.6/xml/dom. It seems that Idle does not follow the sys.path order.
Don't forget to make a back-up first.

行至春深 2024-07-22 03:32:03

有没有办法可以在处理 XML 时保留属性的原始顺序
迷你王国?

是的。 从 Python 3.8 开始,序列化 XML 文档时会保留原始属性顺序。

请参阅 https://docs .python.org/3/library/xml.dom.minidom.html#xml.dom.minidom.Node.writexml

Is there a way I can preserve the original order of attributes when processing XML with
minidom?

Yes. From Python 3.8, the original attribute order is preserved when serializing the XML document.

See https://docs.python.org/3/library/xml.dom.minidom.html#xml.dom.minidom.Node.writexml.

梦忆晨望 2024-07-22 03:32:03

我最终使用了 lxml 库而不是 minidom。

I've ended up using the lxml library instead of minidom.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文