CPython内部结构

发布于 2024-07-13 13:46:46 字数 520 浏览 14 评论 0原文

GAE 有各种限制，其中之一是最大可分配内存块的大小为 1Mb（现在是 10 倍，但这并没有改变问题）。这一限制意味着不能在 list() 中放入超过一定数量的项目，因为 CPython 会尝试为元素指针分配连续的内存块。拥有巨大的 list() 可能被认为是不好的编程习惯，但即使程序本身没有创建巨大的结构，CPython 也会在幕后维护一些结构。

看来 CPython 正在维护单个全局对象列表或其他东西。即具有许多小对象的应用程序倾向于分配越来越大的单个内存块。

第一个想法是 gc，禁用它会稍微改变应用程序的行为，但仍然保留一些结构。

遇到这个问题的一个最简单的简短应用程序是：

a = b = []
number_of_lists = 8000000
for i in xrange(number_of_lists):
    b.append([])
    b = b[0]

任何人都可以告诉我如何防止 CPython 在应用程序中有许多对象时分配巨大的内部结构吗？

原文

GAE has various limitations, one of which is size of biggest allocatable block of memory amounting to 1Mb (now 10 times more, but that doesn't change the question). The limitation means that one cannot put more then some number of items in list() as CPython would try to allocate contiguous memory block for element pointers. Having huge list()s can be considered bad programming practice, but even if no huge structure is created in program itself, CPython maintains some behind the scenes.

It appears that CPython is maintaining single global list of objects or something. I.e. application that has many small objects tend to allocate bigger and bigger single blocks of memory.

First idea was gc, and disabling it changes application behavior a bit but still some structures are maintained.

A simplest short application that experience the issue is:

a = b = []
number_of_lists = 8000000
for i in xrange(number_of_lists):
    b.append([])
    b = b[0]

Can anyone enlighten me how to prevent CPython from allocating huge internal structures when having many objects in application?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

奶茶白久 2024-07-20 13:46:46

在 32 位系统上，您创建的 8000000 个列表中的每一个都将为列表对象本身分配 20 个字节，另外为列表元素向量分配 16 个字节。因此，您尝试分配至少 (20+16) * 8000000 = 20168000000 字节，大约 20 GB。这是最好的情况，如果系统 malloc 仅分配与请求一样多的内存。

我计算了列表对象的大小，如下所示：

PyListObject 结构本身中有 2 个指针（请参阅 listobject.h)
1 个指针和一个 Py_ssize_t 用于 PyObject_HEAD 部分列表对象（参见 object.h< /a>)
一个 Py_ssize_t 用于 PyObject_VAR_HEAD （也在 object.h 中）

列表元素的向量稍微过度分配，以避免在每次追加时调整它的大小 - 请参阅 list_resize在 listobject.c 中。大小为 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ... 因此，您的单元素列表将为 4 个元素分配空间。

您的数据结构是一个有点病态的示例，支付了可变大小列表对象的价格而不利用它 - 所有列表都只有一个元素。您可以通过使用元组而不是列表来避免 12 字节的过度分配，但为了进一步减少内存消耗，您必须使用使用更少对象的不同数据结构。很难说得更具体，因为我不知道你想实现什么目标。

回复收藏 0 原文

物价感观 2024-07-20 13:46:46

我对你在问什么有点困惑。在该代码示例中，不应对任何内容进行垃圾收集，因为您永远不会真正删除任何引用。您在 a 中保存对顶级列表的引用，并在其中添加嵌套列表（每次迭代时在 b 中保存）。如果删除“a =”，那么就会得到未引用的对象。

编辑：作为对第一部分的回应，是的，Python 拥有一个对象列表，因此它可以知道要剔除什么。这就是整个问题吗？如果没有，请评论/编辑您的问题，我将尽力帮助填补空白。

回复收藏 0 原文

北城挽邺 2024-07-20 13:46:46

您想通过

a = b = []

和

b = b[0]

语句实现什么目的？在 Python 中看到这样的语句肯定很奇怪，因为它们不会做您可能天真的期望的事情：在该示例中，a 和 b 是 a 的两个名称。 em>相同列表（想想C中的指针）。如果您进行大量这样的操作，很容易使垃圾收集器（和您自己！）感到困惑，因为您周围漂浮着许多尚未正确清除的奇怪引用。

如果不知道为什么要执行它看起来正在执行的操作，则很难诊断该代码有什么问题。当然，它暴露了解释器的一些怪异之处……但我猜你正在以一种奇怪的方式处理你的问题，而更 Pythonic 的方法可能会产生更好的结果。

What are you trying to accomplish with the

a = b = []

and

b = b[0]

statements? It's certainly odd to see statements like that in Python, because they don't do what you might naively expect: in that example, a and b are two names for the same list (think pointers in C). If you're doing a lot of manipulation like that, it's easy to confuse the garbage collector (and yourself!) because you've got a lot of strange references floating around that haven't been properly cleared.

It's hard to diagnose what's wrong with that code without knowing why you want to do what it appears to be doing. Sure, it exposes a bit of interpreter weirdness... but I'm guessing you're approaching your problem in an odd way, and a more Pythonic approach might yield better results.

回复收藏 0 原文