Python字典能力会动态增加吗？

发布于 2025-01-24 08:21:39 字数 572 浏览 0 评论 0原文

假设在Python3中，我们使用了一个字典，例如：

my_dct = {}
...
for i in range(100):
  my_dct[i] = True
...
for s in "potentially long string":
  my_dct[s] = True

我的问题是，当我们通过Python解释器运行上述Python程序时，解释器在第一行会创建什么容量容器？根据这一大小，循环的第一或第二个过程中的容量会增加吗？

此链接显示容量显示容量8。 =“ https://github.com/python/cpython/cpython/blob/main/objects/dictobject.c” rel =“ nofollow noreferrer”> this 根据类型，128、2^15等显示。

非常感谢任何解释，谢谢。

原文

Let's say in python3, we used a dictionary as:

my_dct = {}
...
for i in range(100):
  my_dct[i] = True
...
for s in "potentially long string":
  my_dct[s] = True

My question is when we run the above python program through the python interpreter, what capacity container does the interpreter create at the first line? Depending on this size, does the capacity increase during the course of the first or second for loop?

This link shows capacity 8. While this shows, according to the type, 128, 2^15, etc. as the capacity of the container upon the instantiation of a dictionary in python.

Would really appreciate any explanation, thank you.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夏至、离别 2025-01-31 08:21:39

dict存储如何随着增长而调整是一个实现细节，大多数用户不需要知道。如果您是开发人员，或者想在C中编写自己的代码，则详细信息可能很有用。但是，对于在列表和dicts之间进行选择，它们无关紧要。

列表和dicts之间的重要区别在于它们的索引方式。列表由整数（或切片）索引，并且“包含”所有值，直到其len。一个键是由键索引的，这可以是任何可用对象。字符串最常见。

In [1]: import sys

从表面上看，这两个对象是等效的：

In [2]: adict = {i:True for i in range(100)}
In [3]: alist = [i for i in range(100)]

两个都有len（） 100，并且可以以相同的方式进行索引：

In [4]: adict[50]
Out[4]: True
In [5]: alist[50]
Out[5]: 50

糟糕，我的意思是使用alist = [thue in range in range（100）），100 true值的列表，而不是list（range（100））。但这不会改变以下讨论。

一个关键区别是列表索引必须是连续的。那就是我们要使用alist [200]，我们将不得不在列表中添加另外101个值。而adict [200] = false简单地将200添加到哈希表中，而新的len 101。

基础存储非常不同。

alist具有一个C数组，其中包含100个值的参考，再加上一些“头部空间”，以允许使用alist.Append（...）快速增长。

ADICT具有哈希表，该表以某种方式存储了对密钥和值的引用。

在这两种情况下，随着它们的成长，存储（“容器”？）填充时必须重新分配。同样，详细信息取决于实现，并且用户在很大程度上看不见 - 除非我们仔细跟踪内存或时间。

我已经看到了许多问题，这些问题尝试比较列表和数组的getizeof。答案总是指出，这种比较的价值有限，因为它测量了截然不同的事情。

In [8]: sys.getsizeof(alist)
Out[8]: 904
In [9]: sys.getsizeof(adict)
Out[9]: 4696

对于list，getsizeof基本上是存储价值引用的C数组的大小。对于100的len（alist），即8*100字节，大约13个值的增长空间 - 此时必须重新分配。

添加6个值不会改变大小：

In [11]: alist.extend([True]*6)
In [12]: len(alist)
Out[12]: 106
In [13]: sys.getsizeof(alist)
Out[13]: 904

再增加6个，强制一个重新安装，具有较大的头部空间（用于27个参考）：

In [14]: alist.extend([True]*6)
In [15]: len(alist)
Out[15]: 112
In [16]: sys.getsizeof(alist)
Out[16]: 1112

我假设getsizeof（adict）测量了标签的大小。那要大得多。

将12个密钥添加到ADICT不会更改大小。有足够的空间来处理这些值而没有冲突。

In [21]: adict.update({i:True for i in range(100,106)})
In [22]: len(adict)
Out[22]: 106
In [23]: sys.getsizeof(adict)
Out[23]: 4696
In [24]: adict.update({i:True for i in range(106,112)})
In [25]: sys.getsizeof(adict)
Out[25]: 4696

对于总记忆脚打印，我们还必须考虑键和值的内存使用。在这些示例中，这并不多，因为所有引用true引用同一对象。并且最多255个int也是统治和独特的。但是钥匙可以是字符串，元组或更复杂的对象。值可以是一个对象，尽管将它们添加到列表或DICT确实会增加内存使用。它们是通过引用而不是价值存储的。

How the dict storage adjusts as it grows is an implementation detail, which most users don't need to know. If you are developer, or want to write your own code in C, the details may be useful. But for choosing between lists and dicts, they don't matter.

The important difference between lists and dicts is how they are indexed. A list is indexed by integers (or slices), and 'contains' all values up to its len. A dict is indexed by keys, which can be any hashable object. Strings are most common.

In [1]: import sys

On the surface these two objects are equivalent:

In [2]: adict = {i:True for i in range(100)}
In [3]: alist = [i for i in range(100)]

Both have len() 100, and can be indexed the same way:

In [4]: adict[50]
Out[4]: True
In [5]: alist[50]
Out[5]: 50

Oops, I meant to use alist = [True for i in range(100)], a list of 100 True values, rather than list(range(100)). But that doesn't change the following discussion.

A key difference is that the list indices have to be consecutive. That is if we want to use alist[200], we will have to add another 101 values to the list. Whereas adict[200]=False simply adds the 200 to the hash table, with a new len of 101.

Underlying storage is quite different.

alist has a C array that contains references for 100 values, plus some 'head room' to allow for fast growth with alist.append(...).

adict has a hash table, that somehow stores references to the keys and the values.

In both cases, as they grow, the storage ('container'?) will have to be reallocated when it fills up. Again the details are implementation dependent, and largely invisible to users - unless we carefully track memory or times.

I've seen a number of SO questions that try to compare the getsizeof of list and arrays. The answers always point out that this comparison has limited value, since it measures very different things.

In [8]: sys.getsizeof(alist)
Out[8]: 904
In [9]: sys.getsizeof(adict)
Out[9]: 4696

For a list, getsizeof is basically the size of the C array that stores value references. For len(alist) of 100, that's 8*100 bytes, with roughly growth space for 13 more values - at which point it will have to be reallocated.

Adding 6 values does not change size:

In [11]: alist.extend([True]*6)
In [12]: len(alist)
Out[12]: 106
In [13]: sys.getsizeof(alist)
Out[13]: 904

Adding another 6, forces a reallocation, with a larger head space (for 27 references):

In [14]: alist.extend([True]*6)
In [15]: len(alist)
Out[15]: 112
In [16]: sys.getsizeof(alist)
Out[16]: 1112

I assume the getsizeof(adict) measures the size of the hashtable. That's quite a bit larger.

Adding 12 more keys to adict doesn't change the size. There's enough space to handle these values without clashes.

In [21]: adict.update({i:True for i in range(100,106)})
In [22]: len(adict)
Out[22]: 106
In [23]: sys.getsizeof(adict)
Out[23]: 4696
In [24]: adict.update({i:True for i in range(106,112)})
In [25]: sys.getsizeof(adict)
Out[25]: 4696

For total memory foot print we also have to consider the memory use of the keys and values. In these examples, that isn't much, since all references to True reference the same object. And ints up to 255 are also preallocated and unique. But keys can be strings, tuples, or more complex objects. And values can be an object, though adding them to a list or dict does increase memory use. They are stored by reference, not value.

回复收藏 0 原文

~没有更多了~