实现嵌套字典的最佳方法是什么?
我有一个数据结构,本质上相当于一个嵌套字典。 假设它看起来像这样:
{'new jersey': {'mercer county': {'plumbers': 3,
'programmers': 81},
'middlesex county': {'programmers': 81,
'salesmen': 62}},
'new york': {'queens county': {'plumbers': 9,
'salesmen': 36}}}
现在,维护和创建它是非常痛苦的; 每次我有一个新的州/县/职业时,我都必须通过令人讨厌的 try/catch 块创建下层字典。 此外,如果我想遍历所有值,我必须创建烦人的嵌套迭代器。
我还可以使用元组作为键,如下所示:
{('new jersey', 'mercer county', 'plumbers'): 3,
('new jersey', 'mercer county', 'programmers'): 81,
('new jersey', 'middlesex county', 'programmers'): 81,
('new jersey', 'middlesex county', 'salesmen'): 62,
('new york', 'queens county', 'plumbers'): 9,
('new york', 'queens county', 'salesmen'): 36}
这使得迭代值非常简单和自然,但是执行诸如聚合和查看字典子集之类的操作在语法上更加痛苦(例如,如果我只想进入状态-各州)。
基本上,有时我想将嵌套字典视为平面字典,有时我想将其确实视为复杂的层次结构。 我可以将这一切都包含在一个类中,但似乎有人可能已经这样做了。 或者,似乎可能有一些非常优雅的语法结构可以做到这一点。
我怎样才能做得更好?
附录:我知道 setdefault()
但它并不能真正实现干净的语法。 此外,您创建的每个子词典仍然需要手动设置setdefault()
。
I have a data structure which essentially amounts to a nested dictionary. Let's say it looks like this:
{'new jersey': {'mercer county': {'plumbers': 3,
'programmers': 81},
'middlesex county': {'programmers': 81,
'salesmen': 62}},
'new york': {'queens county': {'plumbers': 9,
'salesmen': 36}}}
Now, maintaining and creating this is pretty painful; every time I have a new state/county/profession I have to create the lower layer dictionaries via obnoxious try/catch blocks. Moreover, I have to create annoying nested iterators if I want to go over all the values.
I could also use tuples as keys, like such:
{('new jersey', 'mercer county', 'plumbers'): 3,
('new jersey', 'mercer county', 'programmers'): 81,
('new jersey', 'middlesex county', 'programmers'): 81,
('new jersey', 'middlesex county', 'salesmen'): 62,
('new york', 'queens county', 'plumbers'): 9,
('new york', 'queens county', 'salesmen'): 36}
This makes iterating over the values very simple and natural, but it is more syntactically painful to do things like aggregations and looking at subsets of the dictionary (e.g. if I just want to go state-by-state).
Basically, sometimes I want to think of a nested dictionary as a flat dictionary, and sometimes I want to think of it indeed as a complex hierarchy. I could wrap this all in a class, but it seems like someone might have done this already. Alternatively, it seems like there might be some really elegant syntactical constructions to do this.
How could I do this better?
Addendum: I'm aware of setdefault()
but it doesn't really make for clean syntax. Also, each sub-dictionary you create still needs to have setdefault()
manually set.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(22)
这是一个坏主意,不要这样做。 相反,使用常规字典并在适当的情况下使用
dict.setdefault
,这样当正常使用下缺少键时,您会得到预期的KeyError
。 如果您坚持要获得此行为,那么如何搬起石头砸自己的脚:在
dict
子类上实现__missing__
以设置并返回一个新实例。自 Python 2.5 以来,这种方法已经可用(并记录在案),并且(对我来说特别有价值)它的打印效果就像普通的字典,而不是自动激活的默认字典的丑陋打印:(
注意
self[key]
位于左侧分配的一侧,所以这里没有递归。)并说你有一些数据:
这是我们的使用代码:
现在:
批评
对这种类型容器的批评是,如果用户拼错了一个键,我们的代码可能会默默地失败:
并且另外,现在我们的数据中会有一个拼写错误的县:
说明:
每当访问某个键但丢失时,我们只是提供类
Vividict
的另一个嵌套实例。 (返回值分配很有用,因为它避免了我们另外调用字典上的 getter,不幸的是,我们无法在设置时返回它。)请注意,这些语义与最受支持的答案相同,但只有一半代码行 - nosklo 的实现:
使用演示
下面只是一个示例,说明如何轻松地使用该字典动态创建嵌套字典结构。 这可以快速创建一个层次树结构,其深度可以达到您想要的深度。
输出:
正如最后一行所示,它打印得非常漂亮,并且便于手动检查。 但是,如果您想直观地检查数据,那么实现 __missing__ 将其类的新实例设置为键并返回它是一个更好的解决方案。
其他替代方案,作为对比:
dict.setdefault
虽然提问者认为这不干净,但我发现它比我自己的
Vividict
更好。现在:
拼写错误会吵闹地失败,并且不会用错误的信息弄乱我们的数据:
此外,我认为 setdefault 在循环中使用时效果很好,并且您不知道将得到什么键,但重复使用变得相当繁琐,我认为没有人愿意坚持以下内容:
另一个批评是 setdefault 需要一个新实例,无论是否使用它。 然而,Python(或者至少是 CPython)在处理未使用和未引用的新实例方面相当聪明,例如,它重用内存中的位置:
自动激活的 defaultdict
这是一个简洁的实现,并且可以在您的脚本中使用不检查数据与实现
__missing__
一样有用:但是如果您需要检查数据,则以相同方式填充数据的自动激活的 defaultdict 的结果如下所示
:输出非常不优雅,结果也非常不可读。 通常给出的解决方案是递归地转换回字典以进行手动检查。 这个不平凡的解决方案留给读者作为练习。
性能
最后我们来看看性能。 我正在减去实例化的成本。
根据性能,
dict.setdefault
效果最好。 如果您关心执行速度,我强烈推荐将其用于生产代码。如果您需要它进行交互使用(也许在 IPython 笔记本中),那么性能并不重要 - 在这种情况下,我会选择 Vividic 以提高输出的可读性。 与 AutoVivification 对象(使用
__getitem__
而不是为此目的而创建的__missing__
)相比,它要优越得多。结论
优点
dict
上实现__missing__
来设置和返回新实例比替代方案稍微困难一些,但具有易于实例化、,并且因为它比修改
__getitem__
更简单且性能更高,因此应该优先于该方法。然而,它也有缺点:
因此,我个人更喜欢
setdefault
而不是其他解决方案,并且在我需要这种行为的每种情况下都有。This is a bad idea, don't do it. Instead, use a regular dictionary and use
dict.setdefault
where apropos, so when keys are missing under normal usage you get the expectedKeyError
. If you insist on getting this behavior, here's how to shoot yourself in the foot:Implement
__missing__
on adict
subclass to set and return a new instance.This approach has been available (and documented) since Python 2.5, and (particularly valuable to me) it pretty prints just like a normal dict, instead of the ugly printing of an autovivified defaultdict:
(Note
self[key]
is on the left-hand side of assignment, so there's no recursion here.)and say you have some data:
Here's our usage code:
And now:
Criticism
A criticism of this type of container is that if the user misspells a key, our code could fail silently:
And additionally now we'd have a misspelled county in our data:
Explanation:
We're just providing another nested instance of our class
Vividict
whenever a key is accessed but missing. (Returning the value assignment is useful because it avoids us additionally calling the getter on the dict, and unfortunately, we can't return it as it is being set.)Note, these are the same semantics as the most upvoted answer but in half the lines of code - nosklo's implementation:
Demonstration of Usage
Below is just an example of how this dict could be easily used to create a nested dict structure on the fly. This can quickly create a hierarchical tree structure as deeply as you might want to go.
Which outputs:
And as the last line shows, it pretty prints beautifully and in order for manual inspection. But if you want to visually inspect your data, implementing
__missing__
to set a new instance of its class to the key and return it is a far better solution.Other alternatives, for contrast:
dict.setdefault
Although the asker thinks this isn't clean, I find it preferable to the
Vividict
myself.and now:
A misspelling would fail noisily, and not clutter our data with bad information:
Additionally, I think setdefault works great when used in loops and you don't know what you're going to get for keys, but repetitive usage becomes quite burdensome, and I don't think anyone would want to keep up the following:
Another criticism is that setdefault requires a new instance whether it is used or not. However, Python (or at least CPython) is rather smart about handling unused and unreferenced new instances, for example, it reuses the location in memory:
An auto-vivified defaultdict
This is a neat looking implementation, and usage in a script that you're not inspecting the data on would be as useful as implementing
__missing__
:But if you need to inspect your data, the results of an auto-vivified defaultdict populated with data in the same way looks like this:
This output is quite inelegant, and the results are quite unreadable. The solution typically given is to recursively convert back to a dict for manual inspection. This non-trivial solution is left as an exercise for the reader.
Performance
Finally, let's look at performance. I'm subtracting the costs of instantiation.
Based on performance,
dict.setdefault
works the best. I'd highly recommend it for production code, in cases where you care about execution speed.If you need this for interactive use (in an IPython notebook, perhaps) then performance doesn't really matter - in which case, I'd go with Vividict for readability of the output. Compared to the AutoVivification object (which uses
__getitem__
instead of__missing__
, which was made for this purpose) it is far superior.Conclusion
Implementing
__missing__
on a subclasseddict
to set and return a new instance is slightly more difficult than alternatives but has the benefits ofand because it is less complicated and more performant than modifying
__getitem__
, it should be preferred to that method.Nevertheless, it has drawbacks:
Thus I personally prefer
setdefault
to the other solutions, and have in every situation where I have needed this sort of behavior.测试:
输出:
Testing:
Output:
只是因为我还没有见过这么小的一个,所以这里有一个可以按照你喜欢的方式嵌套的字典,不用担心:
Just because I haven't seen one this small, here's a dict that gets as nested as you like, no sweat:
您可以创建一个 YAML 文件并使用 PyYaml 读取它。
第 1 步:创建一个 YAML 文件“employment.yml”:
第 2 步:用 Python 读取它
,现在
my_shnazzy_dictionary
包含您的所有值。 如果您需要即时执行此操作,可以将 YAML 创建为字符串并将其输入到yaml.safe_load(...)
中。You could create a YAML file and read it in using PyYaml.
Step 1: Create a YAML file, "employment.yml":
Step 2: Read it in Python
and now
my_shnazzy_dictionary
has all your values. If you needed to do this on the fly, you can create the YAML as a string and feed that intoyaml.safe_load(...)
.由于您有星型模式设计,您可能希望将其结构更像关系表而不是字典。
这种事情对于创建类似数据仓库的设计大有帮助,而无需 SQL 开销。
Since you have a star-schema design, you might want to structure it more like a relational table and less like a dictionary.
That kind of thing can go a long way to creating a data warehouse-like design without the SQL overheads.
如果嵌套层数很少,我会使用
collections.defaultdict
:像这样使用
defaultdict
可以避免很多混乱的setdefault()
、get()
等。If the number of nesting levels is small, I use
collections.defaultdict
for this:Using
defaultdict
like this avoids a lot of messysetdefault()
,get()
, etc.这是一个返回任意深度的嵌套字典的函数:
像这样使用它: 用
这样的东西迭代所有内容:
这会打印出:
您最终可能希望这样做,以便新项目不能添加到字典中。 很容易递归地将所有这些
defaultdict
转换为普通dict
。This is a function that returns a nested dictionary of arbitrary depth:
Use it like this:
Iterate through everything with something like this:
This prints out:
You might eventually want to make it so that new items can not be added to the dict. It's easy to recursively convert all these
defaultdict
s to normaldict
s.正如其他人所建议的,关系数据库可能对您更有用。 您可以使用内存中的sqlite3数据库作为数据结构来创建表,然后查询它们。
这只是一个简单的例子。 您可以为州、县和职位定义单独的表。
As others have suggested, a relational database could be more useful to you. You can use a in-memory sqlite3 database as a data structure to create tables and then query them.
This is just a simple example. You could define separate tables for states, counties and job titles.
我发现
setdefault
非常有用; 它检查键是否存在,如果不存在则添加它:setdefault
始终返回相关键,因此您实际上是在适当的位置更新“d
”的值。当谈到迭代时,我相信如果 Python 中尚不存在生成器,您可以很容易地编写一个生成器:
I find
setdefault
quite useful; It checks if a key is present and adds it if not:setdefault
always returns the relevant key, so you are actually updating the values of 'd
' in place.When it comes to iterating, I'm sure you could write a generator easily enough if one doesn't already exist in Python:
collections.defaultdict
可以进行子类化以创建嵌套字典。 然后向该类添加任何有用的迭代方法。collections.defaultdict
can be sub-classed to make a nested dict. Then add any useful iteration methods to that class.您可以使用 Addict:https://github.com/mewwts/addict
You can use Addict: https://github.com/mewwts/addict
至于“令人讨厌的 try/catch 块”:
yields
您可以使用它从平面字典格式转换为结构化格式:
As for "obnoxious try/catch blocks":
yields
You can use this to convert from your flat dictionary format to structured format:
defaultdict()
是你的朋友!对于二维字典,您可以执行以下操作:
对于更多维度,您可以:
defaultdict()
is your friend!For a two dimensional dictionary you can do:
For more dimensions you can:
为了轻松迭代嵌套字典,为什么不编写一个简单的生成器呢?
因此,如果您有复杂的嵌套字典,那么迭代它就变得很简单:
显然您的生成器可以生成对您有用的任何格式的数据。
为什么使用 try catch 块来读取树? 在尝试检索字典之前查询某个键是否存在于字典中非常容易(而且可能更安全)。 使用保护子句的函数可能如下所示:
或者,一种可能有点冗长的方法是使用 get 方法:
但对于更简洁的方法,您可能需要考虑使用 collections.defaultdict,自 python 2.5 以来它是标准库的一部分。
我在这里对数据结构的含义进行假设,但应该很容易根据您实际想要做的事情进行调整。
For easy iterating over your nested dictionary, why not just write a simple generator?
So then, if you have your compilicated nested dictionary, iterating over it becomes simple:
Obviously your generator can yield whatever format of data is useful to you.
Why are you using try catch blocks to read the tree? It's easy enough (and probably safer) to query whether a key exists in a dict before trying to retrieve it. A function using guard clauses might look like this:
Or, a perhaps somewhat verbose method, is to use the get method:
But for a somewhat more succinct way, you might want to look at using a collections.defaultdict, which is part of the standard library since python 2.5.
I'm making assumptions about the meaning of your data structure here, but it should be easy to adjust for what you actually want to do.
我喜欢将其包装在一个类中并实现
__getitem__
和__setitem__
的想法,这样它们就实现了一种简单的查询语言:如果你想变得更奇特,你也可以实现类似的东西:
但大多数情况下,我认为实现这样的事情会非常有趣:D
I like the idea of wrapping this in a class and implementing
__getitem__
and__setitem__
such that they implemented a simple query language:If you wanted to get fancy you could also implement something like:
but mostly I think such a thing would be really fun to implement :D
除非您的数据集非常小,否则您可能需要考虑使用关系数据库。 它将完全满足您的需求:轻松添加计数、选择计数子集,甚至按州、县、职业或这些的任意组合聚合计数。
Unless your dataset is going to stay pretty small, you might want to consider using a relational database. It will do exactly what you want: make it easy to add counts, selecting subsets of counts, and even aggregate counts by state, county, occupation, or any combination of these.
示例:
编辑:现在在使用通配符(
无
)查询时返回字典,否则返回单个值。Example:
Edit: Now returning dictionaries when querying with wild cards (
None
), and single values otherwise.我也有类似的事情。 我有很多这样的案例:
但是要深入很多层次。 “.get(item, {})”是关键,因为如果还没有字典,它会创建另一本字典。 与此同时,我也在思考如何应对
这个更好。 现在,有很多
所以相反,我做了:
如果你这样做,会有相同的效果:
更好? 我想是这样。
I have a similar thing going. I have a lot of cases where I do:
But going many levels deep. It's the ".get(item, {})" that's the key as it'll make another dictionary if there isn't one already. Meanwhile, I've been thinking of ways to deal with
this better. Right now, there's a lot of
So instead, I made:
Which has the same effect if you do:
Better? I think so.
您可以在 lambda 和 defaultdict 中使用递归,无需定义名称:
下面是一个示例:
You can use recursion in lambdas and defaultdict, no need to define names:
Here's an example:
我曾经使用过这个功能。 它安全、快速、易于维护。
例子 :
I used to use this function. its safe, quick, easily maintainable.
Example :
对于以下内容(从上面复制)有没有一种方法可以实现追加功能。 我正在尝试使用嵌套字典将值存储为数组。
我目前的实现如下:
For the following (copied from above) is there a way to implement the append function. I am trying to use a nested dictionary to store values as array.
My current implementation is as follows:
来自开源
ndicts
NestedDict 类> 包(我是作者)试图减轻处理嵌套字典的痛苦。 我认为它满足了问题所要求的所有条件。您可以在此处了解其功能的摘要,有关更多详细信息,请查看文档。
初始化
获取项目
将
NestedDict
想象成一个扁平化的字典。同时,你可以获得中间节点,而不仅仅是叶子值。
如果密钥不存在,则会引发异常。
设置项目
与普通字典一样,如果缺少某个键,则会将其添加到
NestedDict
中。这允许从一个空的
NestedDict
开始,可以通过设置新项目来使其生动。迭代
当谈到迭代时,可以将
NestedDict
视为扁平化的字典。 可以使用熟悉的.keys()
、.values()
和.item()
方法。The
NestedDict
class from the open-sourcendicts
package (I am the author) tries to mitigate the pain of dealing with nested dictionaries. I think it ticks all the boxes that the questions asks for.Here you have a summary of its capabilities, for more details check the documentation.
Initialize
Get items
Think of a
NestedDict
as if it was a flattened dictionary.At the same time, you can get intermediate nodes, not just the leaf values.
If a key is not present, an exception is thrown.
Set items
As in a normal dictionary, if a key is missing it is added to the
NestedDict
.This allows to start with an empty
NestedDict
which can be vivified by setting new items.Iterate
When it comes to iteration think of a
NestedDict
as a flattened dictionary. The familiar.keys()
,.values()
and.item()
methods are available.