Python 大变量 RAM 使用情况

发布于 2024-08-29 14:35:32 字数 133 浏览 7 评论 0原文

假设有一个字典变量在运行时变得非常大 - 达到数百万个键:值对。

该变量是否存储在 RAM 中,从而有效地耗尽了所有可用内存并减慢了系统的其余部分?

要求解释器显示整个字典是一个坏主意,但是只要一次访问一个键就可以了吗?

Say there is a dict variable that grows very large during runtime - up into millions of key:value pairs.

Does this variable get stored in RAM, effectively using up all the available memory and slowing down the rest of the system?

Asking the interpreter to display the entire dict is a bad idea, but would it be okay as long as one key is accessed at a time?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

风尘浪孓 2024-09-05 14:35:32

是的,字典将存储在进程内存中。因此,如果它变得足够大,以致系统 RAM 中没有足够的空间,那么当系统开始与磁盘交换内存或从磁盘交换内存时,您可能会看到速度大幅下降。

其他人则表示,几百万件物品不应该构成问题;我不太确定。字典的开销本身(在计算键和值占用的内存之前)是很大的。对于 Python 2.6 或更高版本, sys.getsizeof 提供了一些有用的信息关于各种 Python 结构占用多少 RAM。来自 64 位 OS X 机器上的 Python 2.6 的一些快速结果:

>>> from sys import getsizeof
>>> getsizeof(dict((n, 0) for n in range(5462)))/5462.
144.03368729403149
>>> getsizeof(dict((n, 0) for n in range(5461)))/5461.
36.053470060428495

因此,在这台机器上,字典开销在每个项目 36 字节到每个项目 144 字节之间变化(确切的值取决于字典的内部哈希表的完整程度;这里5461 = 2**14//3是内部哈希表放大的阈值之一)。这是在添加字典项本身的开销之前的情况;如果它们都是短字符串(例如 6 个字符或更少),那么每个项目仍然会增加另一个 >= 80 字节(如果许多不同的键共享相同的值,则可能会更少)。

因此,在典型机器上,不需要那么多数百万个字典项来耗尽 RAM。

Yes, the dict will be stored in the process memory. So if it gets large enough that there's not enough room in the system RAM, then you can expect to see massive slowdown as the system starts swapping memory to and from disk.

Others have said that a few million items shouldn't pose a problem; I'm not so sure. The dict overhead itself (before counting the memory taken by the keys and values) is significant. For Python 2.6 or later, sys.getsizeof gives some useful information about how much RAM various Python structures take up. Some quick results, from Python 2.6 on a 64-bit OS X machine:

>>> from sys import getsizeof
>>> getsizeof(dict((n, 0) for n in range(5462)))/5462.
144.03368729403149
>>> getsizeof(dict((n, 0) for n in range(5461)))/5461.
36.053470060428495

So the dict overhead varies between 36 bytes per item and 144 bytes per item on this machine (the exact value depending on how full the dictionary's internal hash table is; here 5461 = 2**14//3 is one of the thresholds where the internal hash table is enlarged). And that's before adding the overhead for the dict items themselves; if they're all short strings (6 characters or less, say) then that still adds another >= 80 bytes per item (possibly less if many different keys share the same value).

So it wouldn't take that many million dict items to exhaust RAM on a typical machine.

久随 2024-09-05 14:35:32

对数百万个项目的主要关注不是字典本身,而是每个项目占用多少空间。不过,除非你做了一些奇怪的事情,否则它们应该很合适。

但是,如果您有一个包含数百万个键的字典,那么您可能做错了什么。您应该执行以下一项或两项操作:

  1. 弄清楚您实际应该使用什么数据结构,因为单个字典可能不是正确的答案。这到底是什么取决于您正在做什么。

  2. 使用数据库。您的 Python 应该带有 sqlite3 模块,所以这是一个开始。

The main concern with the millions of items is not the dictionary itself so much as how much space each of these items takes up. Still, unless you're doing something weird, they should probably fit.

If you've got a dict with millions of keys, though, you're probably doing something wrong. You should do one or both of:

  1. Figure out what data structure you should actually be using, because a single dict is probably not the right answer. Exactly what this would be depends on what you're doing.

  2. Use a database. Your Python should come with a sqlite3 module, so that's a start.

明明#如月 2024-09-05 14:35:32

是的,Python dict 存储在 RAM 中。然而,对于现代计算机来说,几百万个密钥并不是问题。如果您需要越来越多的数据并且 RAM 即将耗尽,请考虑使用真正的数据库。选项包括关系型数据库(如 SQLite)(顺便说一下,内置于 Python 中)或键值存储(如 Redis)。

在解释器中显示数百万个项目没有什么意义,但访问单个元素应该仍然非常有效。

Yes, a Python dict is stored in RAM. A few million keys isn't an issue for modern computers, however. If you need more and more data and RAM is running out, consider using a real database. Options include a relational DB like SQLite (built-in in Python, by the way) or a key-value store like Redis.

It makes little sense displaying millions of items in the interpreter, but accessing a single element should be still very efficient.

英雄似剑 2024-09-05 14:35:32

据我所知,Python 使用最好的哈希算法,因此您可能会获得最佳的内存效率和性能。现在,整个内容是保存在 RAM 中还是提交到交换文件取决于您的操作系统,并且取决于您拥有的 RAM 量。
我想说的是最好是尝试一下:

from random import randint
a = {}
for i in xrange(10*10**6):
    a[i] = i

运行它时看起来怎么样?我的系统占用大约 350Mb,至少可以说是可以管理的。

For all I know Python uses the best hashing algorithms so you are probably going to get the best possible memory efficiency and performance. Now, whether the whole thing is kept in RAM or committed to a swap file is up to your OS and depends on the amount of RAM you have.
What I'd say is best if to just try it:

from random import randint
a = {}
for i in xrange(10*10**6):
    a[i] = i

How is this looking when you run it? Takes about 350Mb on my system which should be manageable to say the least.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文