当前位置：文江博客话题详情

Python 大变量 RAM 使用情况

发布于 2024-08-29 14:35:32 字数 133 浏览 7 评论 0原文

假设有一个字典变量在运行时变得非常大 - 达到数百万个键：值对。

该变量是否存储在 RAM 中，从而有效地耗尽了所有可用内存并减慢了系统的其余部分？

要求解释器显示整个字典是一个坏主意，但是只要一次访问一个键就可以了吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风尘浪孓 2024-09-05 14:35:32

是的，字典将存储在进程内存中。因此，如果它变得足够大，以致系统 RAM 中没有足够的空间，那么当系统开始与磁盘交换内存或从磁盘交换内存时，您可能会看到速度大幅下降。

其他人则表示，几百万件物品不应该构成问题；我不太确定。字典的开销本身（在计算键和值占用的内存之前）是很大的。对于 Python 2.6 或更高版本， sys.getsizeof 提供了一些有用的信息关于各种 Python 结构占用多少 RAM。来自 64 位 OS X 机器上的 Python 2.6 的一些快速结果：

>>> from sys import getsizeof
>>> getsizeof(dict((n, 0) for n in range(5462)))/5462.
144.03368729403149
>>> getsizeof(dict((n, 0) for n in range(5461)))/5461.
36.053470060428495

因此，在这台机器上，字典开销在每个项目 36 字节到每个项目 144 字节之间变化（确切的值取决于字典的内部哈希表的完整程度；这里5461 = 2**14//3是内部哈希表放大的阈值之一）。这是在添加字典项本身的开销之前的情况；如果它们都是短字符串（例如 6 个字符或更少），那么每个项目仍然会增加另一个 >= 80 字节（如果许多不同的键共享相同的值，则可能会更少）。

因此，在典型机器上，不需要那么多数百万个字典项来耗尽 RAM。

Yes, the dict will be stored in the process memory. So if it gets large enough that there's not enough room in the system RAM, then you can expect to see massive slowdown as the system starts swapping memory to and from disk.

Others have said that a few million items shouldn't pose a problem; I'm not so sure. The dict overhead itself (before counting the memory taken by the keys and values) is significant. For Python 2.6 or later, sys.getsizeof gives some useful information about how much RAM various Python structures take up. Some quick results, from Python 2.6 on a 64-bit OS X machine:

>>> from sys import getsizeof
>>> getsizeof(dict((n, 0) for n in range(5462)))/5462.
144.03368729403149
>>> getsizeof(dict((n, 0) for n in range(5461)))/5461.
36.053470060428495

So the dict overhead varies between 36 bytes per item and 144 bytes per item on this machine (the exact value depending on how full the dictionary's internal hash table is; here 5461 = 2**14//3 is one of the thresholds where the internal hash table is enlarged). And that's before adding the overhead for the dict items themselves; if they're all short strings (6 characters or less, say) then that still adds another >= 80 bytes per item (possibly less if many different keys share the same value).

So it wouldn't take that many million dict items to exhaust RAM on a typical machine.

回复收藏 0 原文

久随 2024-09-05 14:35:32

对数百万个项目的主要关注不是字典本身，而是每个项目占用多少空间。不过，除非你做了一些奇怪的事情，否则它们应该很合适。

但是，如果您有一个包含数百万个键的字典，那么您可能做错了什么。您应该执行以下一项或两项操作：

弄清楚您实际应该使用什么数据结构，因为单个字典可能不是正确的答案。这到底是什么取决于您正在做什么。
使用数据库。您的 Python 应该带有 sqlite3 模块，所以这是一个开始。

回复收藏 0 原文

明明#如月 2024-09-05 14:35:32

是的，Python dict 存储在 RAM 中。然而，对于现代计算机来说，几百万个密钥并不是问题。如果您需要越来越多的数据并且 RAM 即将耗尽，请考虑使用真正的数据库。选项包括关系型数据库（如 SQLite）（顺便说一下，内置于 Python 中）或键值存储（如 Redis）。

在解释器中显示数百万个项目没有什么意义，但访问单个元素应该仍然非常有效。

回复收藏 0 原文

英雄似剑 2024-09-05 14:35:32

据我所知，Python 使用最好的哈希算法，因此您可能会获得最佳的内存效率和性能。现在，整个内容是保存在 RAM 中还是提交到交换文件取决于您的操作系统，并且取决于您拥有的 RAM 量。
我想说的是最好是尝试一下：

from random import randint
a = {}
for i in xrange(10*10**6):
    a[i] = i

运行它时看起来怎么样？我的系统占用大约 350Mb，至少可以说是可以管理的。

For all I know Python uses the best hashing algorithms so you are probably going to get the best possible memory efficiency and performance. Now, whether the whole thing is kept in RAM or committed to a swap file is up to your OS and depends on the amount of RAM you have.
What I'd say is best if to just try it:

from random import randint
a = {}
for i in xrange(10*10**6):
    a[i] = i

How is this looking when you run it? Takes about 350Mb on my system which should be manageable to say the least.

回复收藏 0 原文

~没有更多了~

关于作者

濫情▎り

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

Python 大变量 RAM 使用情况

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

離殇

小姐丶请自重

Aik

国产ˉ祖宗

猥琐帝

半仙

友情链接

Python 大变量 RAM 使用情况

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

離殇

小姐丶请自重

Aik

国产ˉ祖宗

猥琐帝

半仙

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。