是否可以给 python dict 一个初始容量(并且有用)

发布于 2024-09-05 09:18:22 字数 253 浏览 3 评论 0原文

我正在用大约 10,000,000 个项目填充 python 字典。我对字典(或哈希表)的理解是,当太多元素进入其中时,需要调整大小,这个操作会花费相当长的时间。

有没有办法对 python 字典说你将在其中存储至少 n 个项目,以便它可以从一开始就分配内存?或者说这个优化对我的运行速度没有任何好处?

(不,我还没有检查我的小脚本的缓慢是否是因为这个,我实际上现在不知道如何做到这一点。然而,这是我在Java中要做的事情,正确设置HashSet的初始容量)

I am filling a python dict with around 10,000,000 items. My understanding of dict (or hashtables) is that when too much elements get in them, the need to resize, an operation that cost quite some time.

Is there a way to say to a python dict that you will be storing at least n items in it, so that it can allocate memory from the start? Or will this optimization not do any good to my running speed?

(And no, I have not checked that the slowness of my small script is because of this, I actually wouldn't now how to do that. This is however something I would do in Java, set the initial capacity of the HashSet right)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

怪我闹别瞎闹 2024-09-12 09:18:22

首先,我听说有传言说你可以在初始化时设置字典的大小,但我从未见过任何文档或 PEP 描述如何做到这一点。

考虑到这一点,我对你们的物品数量进行了分析,如下所述。虽然每次调整字典大小可能需要一些时间,但我建议您不要担心,至少在您可以测试其性能之前继续进行。

确定调整大小时涉及的两个规则是元素数量和调整大小因子。当添加元素使其超过 2/3 标记时,字典将在 2/3 满时自行调整大小。低于 50,000 个元素,它将增加 4 倍,高于该数量则增加 2 倍。使用您对 10,000,000 个元素的估计(在 2^23 和 2^24 之间),您的字典将调整自身大小 15 倍(低于 50k 的 7 倍,以上 8 倍)。另一次调整大小将发生在刚刚超过 11,100,000 时。

调整哈希表中当前元素的大小和替换确实需要一些时间,但我想知道您是否会注意到它以及附近代码中进行的其他操作。我只是整理了一个计时套件,比较字典大小为 2^3 到 2^24 的每个边界上五个位置的插入,“边界”添加平均比“非边界”添加长 0.4 纳秒。这长了 0.17%...可能可以接受。所有操作的最小值为 0.2085 微秒,最大值为 0.2412 微秒。

希望这是有洞察力的,如果您确实检查了代码的性能,请进行后续编辑!我关于字典内部结构的主要资源是 Brandon Rhodes 在 PyCon 2010 上的精彩演讲:The强大的词典

First off, I've heard rumor that you can set the size of a dictionary at initialization, but I have never seen any documentation or PEP describing how this would be done.

With this in mind I ran an analysis on your quantity of items, described below. While it may take some time to resize the dictionary each time I would recommend moving ahead without worrying about it, at least until you can test its performance.

The two rules that concern us in determining resizing is number of elements and factor of resizing. A dictionary will resize itself when it is 2/3 full on the addition of the element putting it over the 2/3 mark. Below 50,000 elements it will increase by a factor of 4, above that amount by a factor of 2. Using your estimate of 10,000,000 elements (between 2^23 and 2^24) your dictionary will resize itself 15 times (7 times below 50k, 8 times above). Another resize would occur just past 11,100,000.

Resizing and replacing the current elements in the hashtable does take some time, but I wonder if you'd notice it with whatever else you have going on in the code nearby. I just put together a timing suite comparing inserts at five places along each boundary from dictionary sizes of 2^3 through 2^24, and the "border" additions average 0.4 nanoseconds longer than the "non-border" additions. This is 0.17% longer... probably acceptable. The minimum for all operations was 0.2085 microseconds, and max was 0.2412 microseconds.

Hope this is insightful, and if you do check the performance of your code please follow-up with an edit! My primary resource for dictionary internals was the splendid talk given by Brandon Rhodes at PyCon 2010: The Mighty Dictionary

街角迷惘 2024-09-12 09:18:22

是的,你可以,这是我在另一个人的问题中找到的一个解决方案,也与你的问题相关:

d = {}
for i in xrange(4000000):
d[i] = None
# 722ms

d = dict(itertools.izip(xrange(4000000), itertools.repeat(None)))
# 634ms

dict.fromkeys(xrange(4000000))
# 558ms

s = set(xrange(4000000))
dict.fromkeys(s)
# Not including set construction 353ms

这些是初始化具有一定大小的字典的不同方法。

Yes you can and here is a solution I found in another person's question that is related to yours too:

d = {}
for i in xrange(4000000):
d[i] = None
# 722ms

d = dict(itertools.izip(xrange(4000000), itertools.repeat(None)))
# 634ms

dict.fromkeys(xrange(4000000))
# 558ms

s = set(xrange(4000000))
dict.fromkeys(s)
# Not including set construction 353ms

those are different ways to initialize a dictionary with a certain size.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文