Python的List是如何实现的?

发布于 2024-09-27 03:44:58 字数 47 浏览 5 评论 0原文

是链表还是数组?我环顾四周,只发现人们在猜测。我的C知识还不够好,无法看源代码。

Is it a linked list, an array? I searched around and only found people guessing. My C knowledge isn't good enough to look at the source code.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

晨光如昨 2024-10-04 03:44:58

实际上,C 代码非常简单。扩展一个宏,修剪掉一些不相关的注释,基本结构在 < code>listobject.h,它将列表定义为:

typedef struct {
    PyObject_HEAD
    Py_ssize_t ob_size;

    /* Vector of pointers to list elements.  list[0] is ob_item[0], etc. */
    PyObject **ob_item;

    /* ob_item contains space for 'allocated' elements.  The number
     * currently in use is ob_size.
     * Invariants:
     *     0 <= ob_size <= allocated
     *     len(list) == ob_size
     *     ob_item == NULL implies ob_size == allocated == 0
     */
    Py_ssize_t allocated;
} PyListObject;

PyObject_HEAD 包含引用计数和类型标识符。所以,它是一个过度分配的向量/数组。当数组已满时调整数组大小的代码位于 listobject.c。它实际上并不使数组加倍,而是通过

new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
new_allocated += newsize;

每次分配容量来增长,其中 newsize 是请求的大小(不一定是 allocated + 1 因为您可以 >扩展任意数量的元素,而不是逐个追加它们)。

另请参阅 Python 常见问题解答

The C code is pretty simple, actually. Expanding one macro and pruning some irrelevant comments, the basic structure is in listobject.h, which defines a list as:

typedef struct {
    PyObject_HEAD
    Py_ssize_t ob_size;

    /* Vector of pointers to list elements.  list[0] is ob_item[0], etc. */
    PyObject **ob_item;

    /* ob_item contains space for 'allocated' elements.  The number
     * currently in use is ob_size.
     * Invariants:
     *     0 <= ob_size <= allocated
     *     len(list) == ob_size
     *     ob_item == NULL implies ob_size == allocated == 0
     */
    Py_ssize_t allocated;
} PyListObject;

PyObject_HEAD contains a reference count and a type identifier. So, it's a vector/array that overallocates. The code for resizing such an array when it's full is in listobject.c. It doesn't actually double the array, but grows by allocating

new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
new_allocated += newsize;

to the capacity each time, where newsize is the requested size (not necessarily allocated + 1 because you can extend by an arbitrary number of elements instead of append'ing them one by one).

See also the Python FAQ.

手心的温暖 2024-10-04 03:44:58

我建议Laurent Luce 的文章“Python 列表实现”。对我来说真的很有用,因为作者解释了如何在 CPython 中实现该列表,并为此使用了优秀的图表。

列出对象C结构

CPython 中的列表对象由以下 C 结构表示。 ob_item 是指向列表元素的指针列表。 allocate 是内存中分配的槽数。

typedef 结构 {
    PyObject_VAR_HEAD
    PyObject **ob_item;
    Py_ssize_t 已分配;
PyListObject;

请务必注意分配的槽位和列表大小之间的差异。列表的大小与len(l)相同。分配的槽数就是内存中已分配的槽数。通常,您会看到分配的值可能大于大小。这是为了避免每次将新元素追加到列表时都需要调用 realloc

...

追加

我们将一个整数附加到列表中:l.append(1)。会发生什么?
输入图像描述这里

我们继续添加一个元素:l.append(2)list_resize 是用 n+1 = 2 调用的,但由于分配的大小是 4,因此不需要分配更多内存。当我们添加 2 个整数时,也会发生同样的情况:l.append(3)l.append(4)。下图显示了我们到目前为止所拥有的内容。

在此输入图像描述

...

插入

让我们在位置 1 处插入一个新整数 (5):l.insert(1,5) 并看看内部发生了什么。 输入图像描述这里

...

流行

当弹出最后一个元素时:l.pop(),将调用listpop()list_resizelistpop() 内部调用,如果新大小小于分配大小的一半,则列表将缩小。在此处输入图像描述

您可以观察到插槽 4 仍然指向整数,但重要的是列表的大小现在为 4。
让我们再弹出一个元素。在 list_resize() 中,size – 1 = 4 – 1 = 3 小于已分配插槽的一半,因此列表缩小为 6 个插槽,列表的新大小现在为 3。

您可以观察到插槽 3 和 4 仍然指向一些整数,但重要的是列表的大小现在为 3。在此处输入图像描述

...

删除
Python 列表对象有一个删除特定元素的方法:l.remove(5)在此处输入图像描述

I would suggest Laurent Luce's article "Python list implementation". Was really useful for me because the author explains how the list is implemented in CPython and uses excellent diagrams for this purpose.

List object C structure

A list object in CPython is represented by the following C structure. ob_item is a list of pointers to the list elements. allocated is the number of slots allocated in memory.

typedef struct {
    PyObject_VAR_HEAD
    PyObject **ob_item;
    Py_ssize_t allocated;
} PyListObject;

It is important to notice the difference between allocated slots and the size of the list. The size of a list is the same as len(l). The number of allocated slots is what has been allocated in memory. Often, you will see that allocated can be greater than size. This is to avoid needing calling realloc each time a new elements is appended to the list.

...

Append

We append an integer to the list: l.append(1). What happens?
enter image description here

We continue by adding one more element: l.append(2). list_resize is called with n+1 = 2 but because the allocated size is 4, there is no need to allocate more memory. Same thing happens when we add 2 more integers: l.append(3), l.append(4). The following diagram shows what we have so far.

enter image description here

...

Insert

Let’s insert a new integer (5) at position 1: l.insert(1,5) and look at what happens internally. enter image description here

...

Pop

When you pop the last element: l.pop(), listpop() is called. list_resize is called inside listpop() and if the new size is less than half of the allocated size then the list is shrunk.enter image description here

You can observe that slot 4 still points to the integer but the important thing is the size of the list which is now 4.
Let’s pop one more element. In list_resize(), size – 1 = 4 – 1 = 3 is less than half of the allocated slots so the list is shrunk to 6 slots and the new size of the list is now 3.

You can observe that slot 3 and 4 still point to some integers but the important thing is the size of the list which is now 3.enter image description here

...

Remove
Python list object has a method to remove a specific element: l.remove(5).enter image description here

看轻我的陪伴 2024-10-04 03:44:58

这是一个动态数组。实际证明:无论索引如何,索引都需要相同的时间(当然差异极小(0.0013 微秒!)):

...>python -m timeit --setup="x = [None]*1000" "x[500]"
10000000 loops, best of 3: 0.0579 usec per loop

...>python -m timeit --setup="x = [None]*1000" "x[0]"
10000000 loops, best of 3: 0.0566 usec per loop

如果 IronPython 或 Jython 使用链表,我会感到惊讶 - 它们会破坏许多广泛使用的库的性能假设列表是动态数组。

It's a dynamic array. Practical proof: Indexing takes (of course with extremely small differences (0.0013 µsecs!)) the same time regardless of index:

...>python -m timeit --setup="x = [None]*1000" "x[500]"
10000000 loops, best of 3: 0.0579 usec per loop

...>python -m timeit --setup="x = [None]*1000" "x[0]"
10000000 loops, best of 3: 0.0566 usec per loop

I would be astounded if IronPython or Jython used linked lists - they would ruin the performance of many many widely-used libraries built on the assumption that lists are dynamic arrays.

你的呼吸 2024-10-04 03:44:58

这取决于实现,但 IIRC:

  • CPython 使用指针数组
  • Jython 使用 ArrayList
  • IronPython 显然也使用数组。您可以浏览源代码来找出答案。

因此它们都具有 O(1) 随机访问。

This is implementation dependent, but IIRC:

  • CPython uses an array of pointers
  • Jython uses an ArrayList
  • IronPython apparently also uses an array. You can browse the source code to find out.

Thus they all have O(1) random access.

无法回应 2024-10-04 03:44:58

在 CPython 中,列表是指针数组。 Python 的其他实现可能会选择以不同的方式存储它们。

In CPython, lists are arrays of pointers. Other implementations of Python may choose to store them in different ways.

对不⑦ 2024-10-04 03:44:58

根据文档,

Python 的列表实际上是变长数组,而不是 Lisp 风格的链表。

According to the documentation,

Python’s lists are really variable-length arrays, not Lisp-style linked lists.

£噩梦荏苒 2024-10-04 03:44:58

正如其他人上面所说,列表(当相当大时)是通过分配固定量的空间来实现的,并且,如果该空间应该填满,则分配更大的空间并复制元素。

为了理解为什么该方法是 O(1) 摊销,不失一般性,假设我们已经插入了 a = 2^n 个元素,并且现在必须将表的大小加倍到 2^(n+1) 。这意味着我们当前正在执行 2^(n+1) 次操作。最后一个副本,我们做了 2^n 次操作。在此之前我们做了 2^(n-1)...一直到 8,4,2,1。现在,如果我们将它们相加,我们会得到 1 + 2 + 4 + 8 + ... + 2^(n+1) = 2^(n+2) - 1 < 4*2^n = O(2^n) = O(a) 总插入次数(即 O(1) 摊销时间)。另外,应该注意的是,如果表允许删除,则必须以不同的因子(例如 3x)进行表收缩

As others have stated above, the lists (when appreciably large) are implemented by allocating a fixed amount of space, and, if that space should fill, allocating a larger amount of space and copying over the elements.

To understand why the method is O(1) amortized, without loss of generality, assume we have inserted a = 2^n elements, and we now have to double our table to 2^(n+1) size. That means we're currently doing 2^(n+1) operations. Last copy, we did 2^n operations. Before that we did 2^(n-1)... all the way down to 8,4,2,1. Now, if we add these up, we get 1 + 2 + 4 + 8 + ... + 2^(n+1) = 2^(n+2) - 1 < 4*2^n = O(2^n) = O(a) total insertions (i.e. O(1) amortized time). Also, it should be noted that if the table allows deletions the table shrinking has to be done at a different factor (e.g 3x)

离旧人 2024-10-04 03:44:58

Python 中的列表类似于数组,可以在其中存储多个值。列表是可变的,这意味着您可以更改它。你应该知道的更重要的事情是,当我们创建一个列表时,Python 会自动为该列表变量创建一个reference_id。如果您通过分配其他变量来更改它,则主列表将会更改。让我们尝试举个例子:

list_one = [1,2,3,4]

my_list = list_one

#my_list: [1,2,3,4]

my_list.append("new")

#my_list: [1,2,3,4,'new']
#list_one: [1,2,3,4,'new']

我们追加了 my_list 但我们的主列表已更改。这意味着列表没有指定为复制列表指定为其参考。

A list in Python is something like an array, where you can store multiple values. List is mutable that means you can change it. The more important thing you should know, when we create a list, Python automatically creates a reference_id for that list variable. If you change it by assigning others variable the main list will be change. Let's try with a example:

list_one = [1,2,3,4]

my_list = list_one

#my_list: [1,2,3,4]

my_list.append("new")

#my_list: [1,2,3,4,'new']
#list_one: [1,2,3,4,'new']

We append my_list but our main list has changed. That mean's list didn't assign as a copy list assign as its reference.

空城旧梦 2024-10-04 03:44:58

我发现这篇文章对于理解如何使用Python代码实现列表非常有帮助。

class OhMyList:
    def __init__(self):
        self.length = 0
        self.capacity = 8
        self.array = (self.capacity * ctypes.py_object)()

    def append(self, item):
        if self.length == self.capacity:
            self._resize(self.capacity*2)
        self.array[self.length] = item
        self.length += 1

    def _resize(self, new_cap):
        new_arr = (new_cap * ctypes.py_object)()
        for idx in range(self.length):
            new_arr[idx] = self.array[idx]
        self.array = new_arr
        self.capacity = new_cap

    def __len__(self):
        return self.length

    def __getitem__(self, idx):
        return self.array[idx]

I found this article really helpful to understand how lists are implemented using python code.

class OhMyList:
    def __init__(self):
        self.length = 0
        self.capacity = 8
        self.array = (self.capacity * ctypes.py_object)()

    def append(self, item):
        if self.length == self.capacity:
            self._resize(self.capacity*2)
        self.array[self.length] = item
        self.length += 1

    def _resize(self, new_cap):
        new_arr = (new_cap * ctypes.py_object)()
        for idx in range(self.length):
            new_arr[idx] = self.array[idx]
        self.array = new_arr
        self.capacity = new_cap

    def __len__(self):
        return self.length

    def __getitem__(self, idx):
        return self.array[idx]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文