I'd like to get PyYAML's loader to load mappings (and ordered mappings) into the Python 2.7+ OrderedDict type, instead of the vanilla dict and the list of pairs it currently uses.
In python 3.6+, it seems that dict loading order is preserved by default without special dictionary types. The default Dumper, on the other hand, sorts dictionaries by key. Starting with pyyaml 5.1, you can turn this off by passing sort_keys=False:
a = dict(zip("unsorted", "unsorted"))
s = yaml.safe_dump(a, sort_keys=False)
b = yaml.safe_load(s)
assert list(a.keys()) == list(b.keys()) # True
This can work due to the new dict implementation that has been in use in pypy for some time. While still considered an implementation detail in CPython 3.6, "the insertion-order preserving nature of dicts has been declared an official part of the Python language spec" as of 3.7+, see What's New In Python 3.7.
Note that this is still undocumented from PyYAML side, so you shouldn't rely on this for safety critical applications.
Original answer (compatible with all known versions)
I like @James' solution for its simplicity. However, it changes the default global yaml.Loader class, which can lead to troublesome side effects. Especially, when writing library code this is a bad idea. Also, it doesn't directly work with yaml.safe_load().
Fortunately, the solution can be improved without much effort:
oyaml is a drop-in replacement for PyYAML which preserves dict ordering. Both Python 2 and Python 3 are supported. Just pip install oyaml, and import as shown below:
import oyaml as yaml
You'll no longer be annoyed by screwed-up mappings when dumping/loading.
import sys
from ruamel.yaml import YAML
yaml_str = """\
3: abc
conf:
10: def
3: gij # h is missing
more:
- what
- else
"""
yaml = YAML()
data = yaml.load(yaml_str)
data['conf'][10] = 'klm'
data['conf'][3] = 'jig'
yaml.dump(data, sys.stdout)
将为您提供:
3: abc
conf:
10: klm
3: jig # h is missing
more:
- what
- else
data 的类型为CommentedMap,其功能类似于字典,但保留了额外的信息直到被抛弃(包括保留的评论!)
2015 (and later) option:
ruamel.yaml is a drop in replacement for PyYAML (disclaimer: I am the author of that package). Preserving the order of the mappings was one of the things added in the first version (0.1) back in 2015. Not only does it preserve the order of your dictionaries, it will also preserve comments, anchor names, tags and does support the YAML 1.2 specification (released 2009)
The specification says that the ordering is not guaranteed, but of course there is ordering in the YAML file and the appropriate parser can just hold on to that and transparently generate an object that keeps the ordering. You just need to choose the right parser, loader and dumper¹:
import sys
from ruamel.yaml import YAML
yaml_str = """\
3: abc
conf:
10: def
3: gij # h is missing
more:
- what
- else
"""
yaml = YAML()
data = yaml.load(yaml_str)
data['conf'][10] = 'klm'
data['conf'][3] = 'jig'
yaml.dump(data, sys.stdout)
will give you:
3: abc
conf:
10: klm
3: jig # h is missing
more:
- what
- else
data is of type CommentedMap which functions like a dict, but has extra information that is kept around until being dumped (including the preserved comment!)
On my For PyYaml installation for Python 2.7 I updated __init__.py, constructor.py, and loader.py. Now supports object_pairs_hook option for load commands. Diff of changes I made is below.
import yaml
import re
from collections import OrderedDict
def yaml_load_od(fname):
"load a yaml file as an OrderedDict"
# detects any duped keys (fail on this) and preserves order of top level keys
with open(fname, 'r') as f:
lines = open(fname, "r").read().splitlines()
top_keys = []
duped_keys = []
for line in lines:
m = re.search(r'^([A-Za-z0-9_]+) *:', line)
if m:
if m.group(1) in top_keys:
duped_keys.append(m.group(1))
else:
top_keys.append(m.group(1))
if duped_keys:
raise Exception('ERROR: duplicate keys: {}'.format(duped_keys))
# 2nd pass to set up the OrderedDict
with open(fname, 'r') as f:
d_tmp = yaml.load(f)
return OrderedDict([(key, d_tmp[key]) for key in top_keys])
here's a simple solution that also checks for duplicated top level keys in your map.
import yaml
import re
from collections import OrderedDict
def yaml_load_od(fname):
"load a yaml file as an OrderedDict"
# detects any duped keys (fail on this) and preserves order of top level keys
with open(fname, 'r') as f:
lines = open(fname, "r").read().splitlines()
top_keys = []
duped_keys = []
for line in lines:
m = re.search(r'^([A-Za-z0-9_]+) *:', line)
if m:
if m.group(1) in top_keys:
duped_keys.append(m.group(1))
else:
top_keys.append(m.group(1))
if duped_keys:
raise Exception('ERROR: duplicate keys: {}'.format(duped_keys))
# 2nd pass to set up the OrderedDict
with open(fname, 'r') as f:
d_tmp = yaml.load(f)
return OrderedDict([(key, d_tmp[key]) for key in top_keys])
发布评论
评论(8)
Python >= 3.6
在 python 3.6+ 中,似乎默认情况下会保留 dict 加载 顺序,而无需特殊的字典类型。另一方面,默认的 Dumper 按键对字典进行排序。从 pyyaml 5.1 开始,您可以通过传递 sort_keys=False 来关闭此
功能:由于 新的 dict 实现 已经在 pypy 中使用了一段时间。虽然在 CPython 3.6 中仍被视为实现细节,但从 3.7+ 开始,“字典的插入顺序保留性质已被声明为 Python 语言规范的正式部分”,请参阅 Python 3.7 中的新增功能。
请注意,PyYAML 方面仍然没有记录这一点,因此您不应该依赖它来实现安全关键应用程序。
原始答案(与所有已知版本兼容)
我喜欢@James的解决方案,因为它很简单。但是,它更改了默认的全局 yaml.Loader 类,这可能会导致麻烦的副作用。尤其是在编写库代码时,这是一个坏主意。此外,它不能直接与 yaml.safe_load() 一起使用。
幸运的是,该解决方案可以毫不费力地改进:
对于序列化,您可以使用以下功能:
在每种情况下,您还可以将自定义子类设为全局,这样就不必在每次调用时重新创建它们。
Python >= 3.6
In python 3.6+, it seems that dict loading order is preserved by default without special dictionary types. The default Dumper, on the other hand, sorts dictionaries by key. Starting with
pyyaml 5.1
, you can turn this off by passingsort_keys=False
:This can work due to the new dict implementation that has been in use in pypy for some time. While still considered an implementation detail in CPython 3.6, "the insertion-order preserving nature of dicts has been declared an official part of the Python language spec" as of 3.7+, see What's New In Python 3.7.
Note that this is still undocumented from PyYAML side, so you shouldn't rely on this for safety critical applications.
Original answer (compatible with all known versions)
I like @James' solution for its simplicity. However, it changes the default global
yaml.Loader
class, which can lead to troublesome side effects. Especially, when writing library code this is a bad idea. Also, it doesn't directly work withyaml.safe_load()
.Fortunately, the solution can be improved without much effort:
For serialization, you could use the following funcion:
In each case, you could also make the custom subclasses global, so that they don't have to be recreated on each call.
oyaml
是 PyYAML 保留字典排序。 Python 2 和 Python 3 均受支持。只需pip install oyaml
,然后导入,如下所示:在转储/加载时,您将不再因混乱的映射而烦恼。
注意:我是 oyaml 的作者。
oyaml
is a drop-in replacement for PyYAML which preserves dict ordering. Both Python 2 and Python 3 are supported. Justpip install oyaml
, and import as shown below:You'll no longer be annoyed by screwed-up mappings when dumping/loading.
Note: I'm the author of oyaml.
yaml 模块允许您指定自定义“表示器”以将 Python 对象转换为文本,并指定“构造器”来反转该过程。
The yaml module allow you to specify custom 'representers' to convert Python objects to text and 'constructors' to reverse the process.
2015(及更高版本)选项:
ruamel.yaml 是 PyYAML 的替代品(免责声明:我是该包的作者)。保留映射的顺序是 2015 年第一个版本 (0.1) 中添加的内容之一。它不仅保留字典的顺序,还保留注释、锚点名称、标签,并且支持 YAML 1.2规范(2009 年发布)
该规范表示不保证顺序,但是 YAML 文件中当然存在顺序,并且适当的解析器可以保留该顺序并透明地生成一个保持顺序的对象。您只需要选择正确的解析器、加载器和转储器1:
将为您提供:
data
的类型为CommentedMap
,其功能类似于字典,但保留了额外的信息直到被抛弃(包括保留的评论!)2015 (and later) option:
ruamel.yaml is a drop in replacement for PyYAML (disclaimer: I am the author of that package). Preserving the order of the mappings was one of the things added in the first version (0.1) back in 2015. Not only does it preserve the order of your dictionaries, it will also preserve comments, anchor names, tags and does support the YAML 1.2 specification (released 2009)
The specification says that the ordering is not guaranteed, but of course there is ordering in the YAML file and the appropriate parser can just hold on to that and transparently generate an object that keeps the ordering. You just need to choose the right parser, loader and dumper¹:
will give you:
data
is of typeCommentedMap
which functions like a dict, but has extra information that is kept around until being dumped (including the preserved comment!)注意:有一个基于以下答案的库,它还实现了 CLoader 和 CDumpers: Phynix/yamlloader
我非常怀疑这是最好的方法,但这是我想出的方法,而且它确实有效。还可以作为要点。
Note: there is a library, based on the following answer, which implements also the CLoader and CDumpers: Phynix/yamlloader
I doubt very much that this is the best way to do it, but this is the way I came up with, and it does work. Also available as a gist.
更新:该库已被弃用,取而代之的是 yamlloader (基于在 yamlordereddictloader 上)
我刚刚找到了一个 Python 库(https://pypi.python .org/pypi/yamlordereddictloader/0.1.1),它是根据这个问题的答案创建的,并且使用起来非常简单:
Update: the library was deprecated in favor of the yamlloader (which is based on the yamlordereddictloader)
I've just found a Python library (https://pypi.python.org/pypi/yamlordereddictloader/0.1.1) which was created based on answers to this question and is quite simple to use:
在 Python 2.7 的 For PyYaml 安装中,我更新了 __init__.py、constructor.py 和 loader.py。现在支持加载命令的 object_pairs_hook 选项。我所做的更改的差异如下。
On my For PyYaml installation for Python 2.7 I updated __init__.py, constructor.py, and loader.py. Now supports object_pairs_hook option for load commands. Diff of changes I made is below.
这是一个简单的解决方案,它还可以检查地图中是否有重复的顶级键。
here's a simple solution that also checks for duplicated top level keys in your map.