Python对象在内存中以什么结构存储?

发布于 2024-09-30 03:45:43 字数 452 浏览 4 评论 0原文

假设我有一个类 A:

class A(object):
    def __init__(self, x):
        self.x = x

    def __str__(self):
        return self.x

我使用 sys.getsizeof 来查看 A 实例需要多少字节:

>>> sys.getsizeof(A(1))
64
>>> sys.getsizeof(A('a'))
64
>>> sys.getsizeof(A('aaa'))
64

如上面的实验所示,一个 的大小>无论self.x是什么,对象都是相同的。

所以我想知道python内部如何存储对象?

Say I have a class A:

class A(object):
    def __init__(self, x):
        self.x = x

    def __str__(self):
        return self.x

And I use sys.getsizeof to see how many bytes instance of A takes:

>>> sys.getsizeof(A(1))
64
>>> sys.getsizeof(A('a'))
64
>>> sys.getsizeof(A('aaa'))
64

As illustrated in the experiment above, the size of an A object is the same no matter what self.x is.

So I wonder how python store an object internally?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

等待我真够勒 2024-10-07 03:45:43

这取决于哪种类型的对象,以及哪种 Python 实现 :-)

在 CPython 中,大多数人在使用 Python 时都使用 CPython,所有 Python 对象都表示为一个 C 结构体,PyObject。所有“存储对象”的东西实际上都存储一个PyObject *PyObject 结构体保存了最基本的信息:对象的类型(指向另一个 PyObject 的指针)及其引用计数(一个 ssize_t 大小的整数) .) C 中定义的类型使用需要存储在对象本身中的额外信息来扩展此结构,有时还单独分配额外的数据。

例如,元组(实现为“扩展”PyObject 结构的 PyTupleObject)存储其长度以及它们在结构本身内包含的 PyObject 指针(该结构包含一个 1- length 数组在定义中,但实现分配了一个正确大小的内存块来保存 PyTupleObject 结构以及元组应保存的项目数。)同样的方式,字符串 (PyStringObject)存储它们的长度、缓存的哈希值、一些字符串缓存(“interning”)簿记以及数据的实际 char*。因此,元组和字符串是单个内存块。

另一方面,列表 (PyListObject) 存储其长度,一个 PyObject ** 用于存储其数据,另一个 ssize_t 用于跟踪长度他们为数据分配的空间。因为 Python 到处都存储 PyObject 指针,所以一旦分配了 PyObject 结构,就无法再增长它——这样做可能需要移动该结构,这意味着找到所有指针并更新它们。由于列表可能需要增长,因此它必须与 PyObject 结构分开分配数据。元组和字符串不能增长,因此它们不需要这个。字典 (PyDictObject) 的工作方式相同,尽管它们存储键、值和键的缓存哈希值,而不仅仅是项目。字典还有一些额外的开销来容纳小字典和专门的查找函数。

但这些都是 C 语言中的类型,通常只需查看 C 源代码就可以了解它们会使用多少内存。在 Python 而不是 C 中定义类的实例并不那么容易。最简单的情况,经典类的实例,并不那么困难:它是一个 PyObject,它将 PyObject * 存储到其类中(这与存储的类型不同)在 PyObject 结构中),一个 PyObject * 到它的 __dict__ 属性(它保存所有其他实例属性)和一个 PyObject * 到其weakreflist(由weakref模块使用,并且仅在必要时初始化。)实例的__dict__通常对于该实例是唯一的,因此在计算时对于此类实例的“内存大小”,您通常还想计算属性字典的大小。但它不必特定于实例! __dict__ 可以分配给就好了。

新式的课程使礼仪变得复杂。与经典类不同,新式类的实例不是单独的 C 类型,因此它们不需要单独存储对象的类。它们确实有空间容纳 __dict__ 和weakreflist 引用,但与经典实例不同,它们不需要任意属性的 __dict__ 属性。如果类(及其所有基类)使用 __slots__ 定义一组严格的属性,并且这些属性都没有被命名为 __dict__ ,则该实例不允许任意属性,并且没有分配字典。另一方面,__slots__ 定义的属性必须存储在某处。这是通过将这些属性值的 PyObject 指针直接存储在 PyObject 结构中来完成的,就像用 C 编写的类型一样。因此,__slots__ 中的每个条目都会占用一个PyObject *,无论该属性是否设置。

话虽如此,问题仍然是,由于 Python 中的所有内容都是对象,而保存对象的所有内容都只保存引用,因此有时很难在对象之间划清界限。两个对象可以引用相同的数据位。它们可能保存对该数据的唯一两个引用。删除这两个对象也会删除数据。他们都拥有这些数据吗?只有其中之一,但如果是的话,是哪一个?或者你会说他们拥有一半的数据,即使摆脱一个对象并不会释放一半的数据?弱引用会使情况变得更加复杂:两个对象可以引用相同的数据,但删除其中一个对象可能会导致另一个对象删除对该数据的引用,从而导致数据毕竟要清理干净。

幸运的是,常见情况很容易弄清楚。 Python 的内存调试器可以很好地跟踪这些事情,例如 heapy。只要您的类(及其基类)相当简单,您就可以对它会占用多少内存做出有根据的猜测——尤其是在数量很大的情况下。如果您确实想知道数据结构的确切大小,请查阅 CPython 源代码;大多数内置类型都是在 Include/object.h 中描述并在 Objects/object.c 中实现的简单结构。 PyObject 结构本身在 Include/object.h 中进行了描述。请记住:它的指针一直向下;那些也占用空间。

It depends on what kind of object, and also which Python implementation :-)

In CPython, which is what most people use when they use python, all Python objects are represented by a C struct, PyObject. Everything that 'stores an object' really stores a PyObject *. The PyObject struct holds the bare minimum information: the object's type (a pointer to another PyObject) and its reference count (an ssize_t-sized integer.) Types defined in C extend this struct with extra information they need to store in the object itself, and sometimes allocate extra data separately.

For example, tuples (implemented as a PyTupleObject "extending" a PyObject struct) store their length and the PyObject pointers they contain inside the struct itself (the struct contains a 1-length array in the definition, but the implementation allocates a block of memory of the right size to hold the PyTupleObject struct plus exactly as many items as the tuple should hold.) The same way, strings (PyStringObject) store their length, their cached hashvalue, some string-caching ("interning") bookkeeping, and the actual char* of their data. Tuples and strings are thus single blocks of memory.

On the other hand, lists (PyListObject) store their length, a PyObject ** for their data and another ssize_t to keep track of how much room they allocated for the data. Because Python stores PyObject pointers everywhere, you can't grow a PyObject struct once it's allocated -- doing so may require the struct to move, which would mean finding all pointers and updating them. Because a list may need to grow, it has to allocate the data separately from the PyObject struct. Tuples and strings cannot grow, and so they don't need this. Dicts (PyDictObject) work the same way, although they store the key, the value and the cached hashvalue of the key, instead of just the items. Dict also have some extra overhead to accommodate small dicts and specialized lookup functions.

But these are all types in C, and you can usually see how much memory they would use just by looking at the C source. Instances of classes defined in Python rather than C are not so easy. The simplest case, instances of classic classes, is not so difficult: it's a PyObject that stores a PyObject * to its class (which is not the same thing as the type stored in the PyObject struct already), a PyObject * to its __dict__ attribute (which holds all other instance attributes) and a PyObject * to its weakreflist (which is used by the weakref module, and only initialized if necessary.) The instance's __dict__ is usually unique to the instance, so when calculating the "memory size" of such an instance you usually want to count the size of the attribute dict as well. But it doesn't have to be specific to the instance! __dict__ can be assigned to just fine.

New-style classes complicate manners. Unlike with classic classes, instances of new-style classes are not separate C types, so they do not need to store the object's class separately. They do have room for the __dict__ and weakreflist reference, but unlike classic instances they don't require the __dict__ attribute for arbitrary attributes. if the class (and all its baseclasses) use __slots__ to define a strict set of attributes, and none of those attributes is named __dict__, the instance does not allow arbitrary attributes and no dict is allocated. On the other hand, attributes defined by __slots__ have to be stored somewhere. This is done by storing the PyObject pointers for the values of those attributes directly in the PyObject struct, much like is done with types written in C. Each entry in __slots__ will thus take up a PyObject *, regardless of whether the attribute is set or not.

All that said, the problem remains that since everything in Python is an object and everything that holds an object just holds a reference, it's sometimes very difficult to draw the line between objects. Two objects can refer to the same bit of data. They may hold the only two references to that data. Getting rid of both objects also gets rid of the data. Do they both own the data? Does only one of them, but if so, which one? Or would you say they own half the data, even though getting rid of one object doesn't release half the data? Weakrefs can make this even more complicated: two objects can refer to the same data, but deleting one of the objects may cause the other object to also get rid of its reference to that data, causing the data to be cleaned up after all.

Fortunately the common case is fairly easy to figure out. There are memory debuggers for Python that do a reasonable job at keeping track of these things, like heapy. And as long as your class (and its baseclasses) is reasonably simple, you can make an educated guess at how much memory it would take up -- especially in large numbers. If you really want to know the exact sizes of your datastructures, consult the CPython source; most builtin types are simple structs described in Include/<type>object.h and implemented in Objects/<type>object.c. The PyObject struct itself is described in Include/object.h. Just keep in mind: it's pointers all the way down; those take up room too.

蓝眼泪 2024-10-07 03:45:43

对于新类实例 getsizeof() 返回对 PyObject 的引用的大小 由 C 函数 PyInstance_New() 返回(

如果需要 )所有对象大小的列表检查这个

in the case of a new class instance getsizeof() return the size of a reference to PyObject which is returned by the C function PyInstance_New()

if you want a list of all the object size check this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文