Python:集合类是否“泄漏”?当项目被删除时,比如字典?

发布于 2024-08-23 04:58:58 字数 1112 浏览 4 评论 0原文

我知道当项目被删除时,Python dict 会“泄漏”(因为项目的插槽将被神奇的“删除”值覆盖)……但是 set 类会吗?行为方式相同吗?保留一个集合,随着时间的推移添加和删除内容是否安全?

编辑:好吧,我已经尝试过了,这就是我发现的:

>>> import gc
>>> gc.collect()
0
>>> nums = range(1000000)
>>> gc.collect()
0
### rsize: 20 megs
### A baseline measurement
>>> s = set(nums)
>>> gc.collect()
0
### rsize: 36 megs
>>> for n in nums: s.remove(n)
>>> gc.collect()
0
### rsize: 36 megs
### Memory usage doesn't drop after removing every item from the set…
>>> s = None
>>> gc.collect()
0
### rsize: 20 megs
### … but nulling the reference to the set *does* free the memory.
>>> s = set(nums)
>>> for n in nums: s.remove(n)
>>> for n in nums: s.add(n)
>>> gc.collect()
0
### rsize: 36 megs
### Removing then re-adding keys uses a constant amount of memory…
>>> for n in nums: s.remove(n)
>>> for n in nums: s.add(n+1000000)
>>> gc.collect()
0
### rsize: 47 megs
### … but adding new keys uses more memory.

I know that Python dicts will "leak" when items are removed (because the item's slot will be overwritten with the magic "removed" value)… But will the set class behave the same way? Is it safe to keep a set around, adding and removing stuff from it over time?

Edit: Alright, I've tried it out, and here's what I found:

>>> import gc
>>> gc.collect()
0
>>> nums = range(1000000)
>>> gc.collect()
0
### rsize: 20 megs
### A baseline measurement
>>> s = set(nums)
>>> gc.collect()
0
### rsize: 36 megs
>>> for n in nums: s.remove(n)
>>> gc.collect()
0
### rsize: 36 megs
### Memory usage doesn't drop after removing every item from the set…
>>> s = None
>>> gc.collect()
0
### rsize: 20 megs
### … but nulling the reference to the set *does* free the memory.
>>> s = set(nums)
>>> for n in nums: s.remove(n)
>>> for n in nums: s.add(n)
>>> gc.collect()
0
### rsize: 36 megs
### Removing then re-adding keys uses a constant amount of memory…
>>> for n in nums: s.remove(n)
>>> for n in nums: s.add(n+1000000)
>>> gc.collect()
0
### rsize: 47 megs
### … but adding new keys uses more memory.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

生生漫 2024-08-30 04:58:58

是的,set 基本上是一个哈希表,就像 dict 一样——界面上的差异并不意味着它“下面”有很多差异。偶尔,您应该复制该集合 - myset = set(myset) - 就像您应该复制一个随着时间的推移定期进行许多添加和删除的字典一样。

Yes, set is basically a hash table just like dict -- the differences at the interface don't imply many differences "below" it. Once in a while, you should copy the set -- myset = set(myset) -- just like you should for a dict on which many additions and removals are regularly made over time.

倥絔 2024-08-30 04:58:58

对于此类问题,通常最好进行像这样的快速实验,看看会发生什么:

s = set()
for a in range(1000):
  for b in range(10000000):
    s.add(b)
  for b in range(10000000):
    s.remove(b)

文档和人们所说的内容与实际行为往往不一致。如果这对您很重要,请测试一下。不要依赖别人。

For questions like these it is often best to run a quick experiment like this one and see what happens:

s = set()
for a in range(1000):
  for b in range(10000000):
    s.add(b)
  for b in range(10000000):
    s.remove(b)

What docs and people say and what behaviour actually is are often at odds. If this is important for you, test it. Don't rely on others.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文