Python中字典的深度合并字典
我需要合并多个字典,例如:
dict1 = {1:{"a":{"A"}}, 2:{"b":{"B"}}}
dict2 = {2:{"c":{"C"}}, 3:{"d":{"D"}}}
A
B
C
和 D
为叶子树的,如 {"info1":"value", "info2":"value2"}
字典的级别(深度)未知,可能是{2:{"c":{"z":{"y":{C}}}}}
在我的例子中,它表示一个目录/文件结构,其中节点是文档,叶子是文件。
我想将它们合并以获得:
dict3 = {1:{"a":{"A"}}, 2:{"b":{"B"},"c":{"C"}}, 3:{"d":{"D"}}}
我不知道如何使用Python轻松做到这一点。
I need to merge multiple dictionaries, here's what I have for instance:
dict1 = {1:{"a":{"A"}}, 2:{"b":{"B"}}}
dict2 = {2:{"c":{"C"}}, 3:{"d":{"D"}}}
With A
B
C
and D
being leaves of the tree, like {"info1":"value", "info2":"value2"}
There is an unknown level(depth) of dictionaries, it could be {2:{"c":{"z":{"y":{C}}}}}
In my case it represents a directory/files structure with nodes being docs and leaves being files.
I want to merge them to obtain:
dict3 = {1:{"a":{"A"}}, 2:{"b":{"B"},"c":{"C"}}, 3:{"d":{"D"}}}
I'm not sure how I could do that easily with Python.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(30)
这实际上非常棘手 - 特别是如果您想要在事情不一致时收到有用的错误消息,同时正确接受重复但一致的条目(这里没有其他答案可以做到这一点......)
假设您没有大量条目,则递归函数最简单:
请注意,这会改变
a
-b
的内容被添加到a
(也返回)。如果你想保留a
,你可以像merge(dict(a), b)
这样调用它。agf(如下)指出,您可能有两个以上的字典,在这种情况下您可以使用:
其中所有内容都将添加到
dict1
中。注意:我编辑了最初的答案以改变第一个参数;这使得“减少”更容易解释
This is actually quite tricky - particularly if you want a useful error message when things are inconsistent, while correctly accepting duplicate but consistent entries (something no other answer here does..)
Assuming you don't have huge numbers of entries, a recursive function is easiest:
note that this mutates
a
- the contents ofb
are added toa
(which is also returned). If you want to keepa
you could call it likemerge(dict(a), b)
.agf pointed out (below) that you may have more than two dicts, in which case you can use:
where everything will be added to
dict1
.Note: I edited my initial answer to mutate the first argument; that makes the "reduce" easier to explain
您可以尝试 mergedeep。
安装
使用
You could try mergedeep.
Installation
Usage
这是使用生成器的简单方法:
打印:
Here's an easy way to do it using generators:
This prints:
这个问题的一个问题是字典的值可以是任意复杂的数据片段。根据这些和其他答案,我想出了这段代码:
我的用例是合并 YAML 文件,我只需处理可能数据类型的子集。因此我可以忽略元组和其他对象。对我来说,明智的合并逻辑意味着
其他所有内容和不可预见的情况都会导致错误。
One issue with this question is that the values of the dict can be arbitrarily complex pieces of data. Based upon these and other answers I came up with this code:
My use case is merging YAML files where I only have to deal with a subset of possible data types. Hence I can ignore tuples and other objects. For me a sensible merge logic means
Everything else and the unforeseens results in an error.
由于这是规范问题(尽管存在某些非一般性),我提供了规范的 Pythonic 方法来解决此问题。
最简单的情况:“叶子是以空字典结尾的嵌套字典”:
这是最简单的递归情况,我会推荐两种天真的方法:
我相信我更喜欢第二种而不是第一种,但请记住原始状态第一个项目必须从其起源开始重建。用法如下:
复杂情况:“叶子属于任何其他类型:”
因此,如果它们以字典结尾,则这是合并末尾空字典的简单情况。如果不是的话,事情就不是那么微不足道了。如果是字符串,如何合并它们?集合可以类似地更新,因此我们可以进行这种处理,但我们会丢失它们合并的顺序。那么顺序重要吗?
因此,代替更多信息,最简单的方法是如果两个值都不是字典,则为它们提供标准更新处理:即第二个字典的值将覆盖第一个,即使第二个字典的值是 None 并且第一个字典的值是 a包含大量信息的字典。
现在
将应用程序返回
到原来的问题:
我必须删除字母周围的大括号并将它们放在单引号中才能成为合法的Python(否则它们将在Python 2.7+中设置文字)以及附加缺少大括号:
和
rec_merge(dict1, dict2)
现在返回:与原始问题的期望结果匹配(更改后,例如
{A}
为'A'
。)As this is the canonical question (in spite of certain non-generalities) I'm providing the canonical Pythonic approach to solving this issue.
Simplest Case: "leaves are nested dicts that end in empty dicts":
This is the simplest case for recursion, and I would recommend two naive approaches:
I believe I would prefer the second to the first, but keep in mind that the original state of the first would have to be rebuilt from its origin. Here's the usage:
Complex Case: "leaves are of any other type:"
So if they end in dicts, it's a simple case of merging the end empty dicts. If not, it's not so trivial. If strings, how do you merge them? Sets can be updated similarly, so we could give that treatment, but we lose the order in which they were merged. So does order matter?
So in lieu of more information, the simplest approach will be to give them the standard update treatment if both values are not dicts: i.e. the second dict's value will overwrite the first, even if the second dict's value is None and the first's value is a dict with a lot of info.
And now
returns
Application to the original question:
I've had to remove the curly braces around the letters and put them in single quotes for this to be legit Python (else they would be set literals in Python 2.7+) as well as append a missing brace:
and
rec_merge(dict1, dict2)
now returns:Which matches the desired outcome of the original question (after changing, e.g. the
{A}
to'A'
.)基于@andrew Cooke。此版本处理嵌套的字典列表,还允许选择更新值
Based on @andrew cooke. This version handles nested lists of dicts and also allows the option to update the values
这个简单的递归过程会将一个字典合并到另一个字典中,同时覆盖冲突的键:
输出:
This simple recursive procedure will merge one dictionary into another while overriding conflicting keys:
Output:
如果有人想要另一种方法来解决这个问题,这是我的解决方案。
美德:简短、声明性和功能性风格(递归,不突变)。
潜在缺点:这可能不是您正在寻找的合并。请参阅文档字符串以了解语义。
In case someone wants yet another approach to this problem, here's my solution.
Virtues: short, declarative, and functional in style (recursive, does no mutation).
Potential Drawback: This might not be the merge you're looking for. Consult the docstring for semantics.
基于@andrew Cooke 的回答。
它以更好的方式处理嵌套列表。
Based on answers from @andrew cooke.
It takes care of nested lists in a better way.
Short-n-sweet:
这与 Python 的
dict.update
方法类似(并且构建于其之上)。它会返回None
(如果您愿意,您可以随时添加return d
),因为它会就地更新 dictd
。v
中的键将覆盖d
中的任何现有键(它不会尝试解释字典的内容)。它也适用于其他(“类似字典”)映射。
示例:
Python 的常规 dict.update 方法产生:
Short-n-sweet:
This works like (and is build on) Python's
dict.update
method. It returnsNone
(you can always addreturn d
if you prefer) as it updates dictd
in-place. Keys inv
will overwrite any existing keys ind
(it does not try to interpret the dict's contents).It will also work for other ("dict-like") mappings.
Example:
Where Python's regular
dict.update
method yields:如果您的字典级别未知,那么我建议使用递归函数:
If you have an unknown level of dictionaries, then I would suggest a recursive function:
概述
以下方法将字典深度合并的问题细分为:
参数化的浅层合并函数
merge(f)(a,b)
,它使用函数
f
合并两个字典a
和b
递归合并函数
f
与合并
实现
用于合并两个(非嵌套)字典的函数可以用很多形式编写方式。我个人喜欢
定义适当的递归合并函数
f
的一个好方法是使用 multipledispatch 它允许定义根据参数类型沿不同路径求值的函数。示例
要合并两个嵌套字典,只需使用
merge(f)
例如:注释:
这种方法的优点是:
该函数是由较小的函数构建,每个函数只做一件事情
这使得代码更容易推理和测试
该行为不是硬编码的,而是可以根据需要进行更改和扩展,从而提高代码重用性(请参见下面的示例)。
自定义
一些答案还考虑了包含其他(可能嵌套的)字典等列表的字典。在这种情况下,人们可能需要映射列表并根据位置合并它们。这可以通过向合并函数
f
添加另一个定义来完成:Overview
The following approach subdivides the problem of a deep merge of dicts into:
A parameterized shallow merge function
merge(f)(a,b)
that uses afunction
f
to merge two dictsa
andb
A recursive merger function
f
to be used together withmerge
Implementation
A function for merging two (non nested) dicts can be written in a lot of ways. I personally like
A nice way of defining an appropriate recursive merger function
f
is using multipledispatch which allows to define functions that evaluate along different paths depending on the type of their arguments.Example
To merge two nested dicts simply use
merge(f)
e.g.:Notes:
The advantages of this approach are:
The function is build from smaller functions that each do a single thing
which makes the code simpler to reason about and test
The behaviour is not hard-coded but can be changed and extended as needed which improves code reuse (see example below).
Customization
Some answers also considered dicts that contain lists e.g. of other (potentially nested) dicts. In this case one might want map over the lists and merge them based on position. This can be done by adding another definition to the merger function
f
:andrew Cookes 的回答有一个小问题:在某些情况下,当您修改返回的字典时,它会修改第二个参数
b
。具体来说是因为这一行:如果
b[key]
是一个dict
,它将简单地分配给a
,这意味着任何后续修改该dict
将影响a
和b
。要解决此问题,该行必须替换为:
其中
clone_dict
是:Still。这显然没有考虑
list
、set
和其他内容,但我希望它能够说明尝试合并dicts
时的陷阱。为了完整起见,这是我的版本,您可以在其中传递多个
dict
:There's a slight problem with andrew cookes answer: In some cases it modifies the second argument
b
when you modify the returned dict. Specifically it's because of this line:If
b[key]
is adict
, it will simply be assigned toa
, meaning any subsequent modifications to thatdict
will affect botha
andb
.To fix this, the line would have to be substituted with this:
Where
clone_dict
is:Still. This obviously doesn't account for
list
,set
and other stuff, but I hope it illustrates the pitfalls when trying to mergedicts
.And for completeness sake, here is my version, where you can pass it multiple
dicts
:我有一个迭代解决方案 - 对于大字典和大字典来说效果更好。很多(例如 json 等):
请注意,这将使用 d2 中的值来覆盖 d1,以防它们不是都是字典。 (与 python 的
dict.update()
相同)一些测试:
我已经用大约 1200 个字典进行了测试 - 此方法花费了 0.4 秒,而递归解决方案花费了约 2.5 秒。
I have an iterative solution - works much much better with big dicts & a lot of them (for example jsons etc):
note that this will use the value in d2 to override d1, in case they are not both dicts. (same as python's
dict.update()
)some tests:
I've tested with around ~1200 dicts - this method took 0.4 seconds, while the recursive solution took ~2.5 seconds.
正如许多其他答案中所指出的,递归算法在这里最有意义。一般来说,在使用递归时,最好创建新值而不是尝试修改任何输入数据结构。
我们需要定义每个合并步骤会发生什么。如果两个输入都是字典,这很容易:我们从每一侧复制唯一的键,并递归地合并重复键的值。这是导致问题的基本情况。如果我们为此拿出一个单独的函数,会更容易理解逻辑。作为占位符,我们可以将两个值包装在一个元组中:
现在我们逻辑的核心如下所示:
让我们测试一下:
我们可以轻松修改叶子合并规则,例如:
并观察效果:
我们还可以通过使用第三方库根据输入的类型进行调度来清理此问题。例如,使用 multipledispatch,我们可以执行以下操作:
这允许我们处理 leaf 的各种组合-type 特殊情况无需编写我们自己的类型检查,并且还替换了主递归函数中的类型检查。
As noted in many other answers, a recursive algorithm makes the most sense here. In general, when working with recursion, it is preferable to create new values rather than trying to modify any input data structure.
We need to define what happens at each merge step. If both inputs are dictionaries, this is easy: we copy across unique keys from each side, and recursively merge the values of the duplicated keys. It's the base cases that cause a problem. It will be easier to understand the logic if we pull out a separate function for that. As a placeholder, we could just wrap the two values in a tuple:
Now the core of our logic looks like:
Let's test it:
We can easily modify the leaf-merging rule, for example:
and observe the effects:
We could also potentially clean this up by using a third-party library to dispatch based on the type of the inputs. For example, using multipledispatch, we could do things like:
This allows us to handle various combinations of leaf-type special cases without writing our own type checking, and also replaces the type check in the main recursive function.
您可以使用
合并
toolz
包中的 a> 函数,例如:You can use the
merge
function from thetoolz
package, for example:这个版本的函数将占N个字典,并且只包含字典——不能传递不正确的参数,否则会引发TypeError。合并本身会解决关键冲突,并且不会覆盖合并链下游字典中的数据,而是创建一组值并将其附加到该值;没有数据丢失。
它可能不是页面上最有效的,但它是最彻底的,并且当您合并 2 到 N 个字典时,您不会丢失任何信息。
输出:{1: [1, 2], 2: {1: 2, 3: 1}, 4: 4}
This version of the function will account for N number of dictionaries, and only dictionaries -- no improper parameters can be passed, or it will raise a TypeError. The merge itself accounts for key conflicts, and instead of overwriting data from a dictionary further down the merge chain, it creates a set of values and appends to that; no data is lost.
It might not be the most effecient on the page, but it's the most thorough and you're not going to lose any information when you merge your 2 to N dicts.
output: {1: [1, 2], 2: {1: 2, 3: 1}, 4: 4}
由于 dictviews 支持集合操作,我能够极大地简化 jterrace 的答案。
任何将字典与非字典(从技术上讲,具有“keys”方法的对象和没有“keys”方法的对象)组合的尝试都会引发 AttributeError。这包括对函数的初始调用和递归调用。这正是我想要的,所以我留下了它。您可以轻松捕获递归调用引发的 AttributeErrors,然后生成您想要的任何值。
Since dictviews support set operations, I was able to greatly simplify jterrace's answer.
Any attempt to combine a dict with a non dict (technically, an object with a 'keys' method and an object without a 'keys' method) will raise an AttributeError. This includes both the initial call to the function and recursive calls. This is exactly what I wanted so I left it. You could easily catch an AttributeErrors thrown by the recursive call and then yield any value you please.
以下函数将 b 合并到 a 中。
The following function merges b into a.
当然,代码将取决于您解决合并冲突的规则。这是一个版本,它可以采用任意数量的参数并将它们递归地合并到任意深度,而不使用任何对象突变。它使用以下规则来解决合并冲突:
{"foo": {...}}
优先于{"foo": "bar "}
){"a": 1}
、{"a", 2}
和 <代码>{"a": 3} 按顺序,结果将是{"a": 3}
)The code will depend on your rules for resolving merge conflicts, of course. Here's a version which can take an arbitrary number of arguments and merges them recursively to an arbitrary depth, without using any object mutation. It uses the following rules to resolve merge conflicts:
{"foo": {...}}
takes precedence over{"foo": "bar"}
){"a": 1}
,{"a", 2}
, and{"a": 3}
in order, the result will be{"a": 3}
)还有另一个细微的变化:
这是一个基于纯 python3 集的深度更新函数。它通过一次循环一个级别来更新嵌套字典,并调用自身来更新每个下一个级别的字典值:
一个简单的示例:
And just another slight variation:
Here is a pure python3 set based deep update function. It updates nested dictionaries by looping through one level at a time and calls itself to update each next level of dictionary values:
A simple example:
另一个答案怎么样?!?这也避免了突变/副作用:
How about another answer?!? This one also avoids mutation/side effects:
返回合并而不影响输入字典。
受到 @andrew Cooke 解决方案的启发
Returning a merge without affecting the input dictionaries.
inspired by @andrew cooke's solution
如果您愿意将第二个字典中的值优先于第一个字典中的值,那么可以按照以下方式轻松完成此操作。
示例:
输出:
您可以将其与 functools.reduce() 如果您有两个以上的字典要合并。
If you are fine with giving values in the second dictionary priority over values in the first, then this can be done as easily as the following.
Example:
Output:
You can use it with functools.reduce() if you have more than two dictionaries to merge.
我有两个字典(
a
和b
),每个字典都可以包含任意数量的嵌套字典。我想递归地合并它们,b
优先于a
。将嵌套字典视为树,我想要的是:
a
,以便b
中每个叶子的每条路径都将在a
中表示b
的相应路径中找到叶子,则覆盖a
的子树b
叶节点仍然是叶节点的不变式。现有的答案对我来说有点复杂,并且留下了一些细节。我实现了以下内容,它通过了我的数据集的单元测试。
示例(为了清晰起见,进行了格式化):
b
中需要维护的路径为:1 -> 'b'-> '白色'
2 -> 'd'-> '黑色'
3 -> 'e'
。a
具有唯一且不冲突的路径:1 -> 'a'-> '红色'
1 -> 'c'-> '橙色' -> 'dog'
因此它们仍然出现在合并的地图中。
I had two dictionaries (
a
andb
) which could each contain any number of nested dictionaries. I wanted to recursively merge them, withb
taking precedence overa
.Considering the nested dictionaries as trees, what I wanted was:
a
so that every path to every leaf inb
would be represented ina
a
if a leaf is found in the corresponding path inb
b
leaf nodes remain leafs.The existing answers were a little complicated for my taste and left some details on the shelf. I implemented the following, which passes unit tests for my data set.
Example (formatted for clarity):
The paths in
b
that needed to be maintained were:1 -> 'b' -> 'white'
2 -> 'd' -> 'black'
3 -> 'e'
.a
had the unique and non-conflicting paths of:1 -> 'a' -> 'red'
1 -> 'c' -> 'orange' -> 'dog'
so they are still represented in the merged map.
这是我提出的递归合并字典的解决方案。传递给函数的第一个字典是主字典 - 其中的值将覆盖第二个字典中相同键的值。
This is a solution I made that recursively merges dictionaries. The first dictionary passed to the function is the master dictionary - values in it will overwrite the values in the same key in the second dictionary.
这是一个递归解决方案,类似于上面写的解决方案,但它不会引发任何异常,只是合并两个字典。
This is a recursive solution, similar to the ones written above, but it doesn't raise any exception and simply merge the two dicts.
我一直在测试您的解决方案,并决定在我的项目中使用这个解决方案:
将函数作为参数传递是扩展 jterrace 解决方案以使其与所有其他递归解决方案一样运行的关键。
I've been testing your solutions and decided to use this one in my project:
Passing functions as parameteres is key to extend jterrace solution to behave as all the other recursive solutions.
我能想到的最简单的方法是:
输出:
Easiest way i can think of is :
Output:
我在这里有另一个稍微不同的解决方案:
默认情况下,它会解决有利于第二个字典中的值的冲突,但您可以轻松地覆盖它,通过一些巫术,您甚至可以从中抛出异常。 :)。
I have another slightly different solution here:
By default it resolves conflicts in favor of values from the second dict, but you can easily override this, with some witchery you may be able to even throw exceptions out of it. :).