当前位置：文江博客话题详情

在Python中切片列表而不生成副本

发布于 2024-10-19 07:28:50 字数 155 浏览 5 评论 0原文

给定一个整数列表 L，我需要为 [0, len(L) - 1] 中的 k 生成所有子列表 L[k:] ，不生成副本。

我如何在 Python 中完成这个任务？以某种方式使用缓冲区对象？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

微暖i 2024-10-26 07:28:50

简短的回答

切片列表不会生成列表中对象的副本；它只是复制对它们的引用。这就是所问问题的答案。

长答案

测试可变和不可变值

首先，让我们测试基本声明。我们可以证明，即使在像整数这样的不可变对象的情况下，也只复制引用。这是三个不同的整数对象，每个对象具有相同的值：

>>> a = [1000 + 1, 1000 + 1, 1000 + 1]

它们具有相同的值，但您可以看到它们是三个不同的对象，因为它们具有不同的 id：

>>> map(id, a)
[140502922988976, 140502922988952, 140502922988928]

当您对它们进行切片时，引用仍然存在相同。没有创建新对象：

>>> b = a[1:3]
>>> map(id, b)
[140502922988952, 140502922988928]

使用具有相同值的不同对象表明复制过程不会打扰 interning< /a>——它只是直接复制引用。

使用可变值进行测试会得到相同的结果：

>>> a = [{0: 'zero', 1: 'one'}, ['foo', 'bar']]
>>> map(id, a)
[4380777000, 4380712040]
>>> map(id, a[1:]
... )
[4380712040]

检查剩余内存开销

当然，引用本身会被复制。在 64 位机器上，每个字节占用 8 个字节。每个列表都有自己的 72 字节内存开销：

>>> for i in range(len(a)):
...     x = a[:i]
...     print('len: {}'.format(len(x)))
...     print('size: {}'.format(sys.getsizeof(x)))
... 
len: 0
size: 72
len: 1
size: 80
len: 2
size: 88

正如 Joe Pinsonault 提醒我们，开销会增加。整数对象本身并不是很大——它们比引用大三倍。因此，这从绝对意义上为您节省了一些内存，但渐近地，能够拥有多个作为同一内存“视图”的列表可能会很好。

使用视图节省内存

不幸的是，Python 没有提供简单的方法来生成“视图”到列表中的对象。或者我应该说“幸运”！这意味着您不必担心切片来自哪里；对原始切片的更改不会影响切片。总的来说，这使得对程序行为的推理变得更加容易。

如果您确实想通过使用视图来节省内存，请考虑使用 numpy 数组。当您对 numpy 数组进行切片时，内存在切片和原始数组之间共享：

>>> a = numpy.arange(3)
>>> a
array([0, 1, 2])
>>> b = a[1:3]
>>> b
array([1, 2])

当我们修改 a 并再次查看 b 时会发生什么？

>>> a[2] = 1001
>>> b
array([   1, 1001])

但这意味着您必须确保在修改一个对象时，不会无意中修改另一个对象。这就是使用 numpy 时的权衡：计算机的工作量减少，程序员的工作量增加！

The short answer

Slicing lists does not generate copies of the objects in the list; it just copies the references to them. That is the answer to the question as asked.

The long answer

Testing on mutable and immutable values

First, let's test the basic claim. We can show that even in the case of immutable objects like integers, only the reference is copied. Here are three different integer objects, each with the same value:

>>> a = [1000 + 1, 1000 + 1, 1000 + 1]

They have the same value, but you can see they are three distinct objects because they have different ids:

>>> map(id, a)
[140502922988976, 140502922988952, 140502922988928]

When you slice them, the references remain the same. No new objects have been created:

>>> b = a[1:3]
>>> map(id, b)
[140502922988952, 140502922988928]

Using different objects with the same value shows that the copy process doesn't bother with interning -- it just directly copies the references.

Testing with mutable values gives the same result:

>>> a = [{0: 'zero', 1: 'one'}, ['foo', 'bar']]
>>> map(id, a)
[4380777000, 4380712040]
>>> map(id, a[1:]
... )
[4380712040]

Examining remaining memory overhead

Of course the references themselves are copied. Each one costs 8 bytes on a 64-bit machine. And each list has its own memory overhead of 72 bytes:

>>> for i in range(len(a)):
...     x = a[:i]
...     print('len: {}'.format(len(x)))
...     print('size: {}'.format(sys.getsizeof(x)))
... 
len: 0
size: 72
len: 1
size: 80
len: 2
size: 88

As Joe Pinsonault reminds us, that overhead adds up. And integer objects themselves are not very large -- they are three times larger than references. So this saves you some memory in an absolute sense, but asymptotically, it might be nice to be able to have multiple lists that are "views" into the same memory.

Saving memory by using views

Unfortunately, Python provides no easy way to produce objects that are "views" into lists. Or perhaps I should say "fortunately"! It means you don't have to worry about where a slice comes from; changes to the original won't affect the slice. Overall, that makes reasoning about a program's behavior much easier.

If you really want to save memory by working with views, consider using numpy arrays. When you slice a numpy array, the memory is shared between the slice and the original:

>>> a = numpy.arange(3)
>>> a
array([0, 1, 2])
>>> b = a[1:3]
>>> b
array([1, 2])

What happens when we modify a and look again at b?

>>> a[2] = 1001
>>> b
array([   1, 1001])

But this means you have to be sure that when you modify one object, you aren't inadvertently modifying another. That's the trade-off when you use numpy: less work for the computer, and more work for the programmer!

回复收藏 0 原文

友欢 2024-10-26 07:28:50

根据您正在做的事情，您也许可以使用 islice< /代码>。

由于它通过迭代进行操作，因此不会创建新列表，而只是创建迭代器，根据其范围的要求从原始列表中生成元素。

回复收藏 0 原文

多谢你的绝情让我学会死心 2024-10-26 07:28:50

一个简单的替代 islice 的方法t 迭代不需要的列表项：

def listslice(xs, *args):
    for i in range(len(xs))[slice(*args)]:
        yield xs[i]

用法：

>>> xs = [0, 2, 4, 6, 8, 10]

>>> for x in listslice(xs, 2, 4):
...     print(x)
4
6

A simple alternative to islice that doesn't iterate through list items that it doesn't need to:

def listslice(xs, *args):
    for i in range(len(xs))[slice(*args)]:
        yield xs[i]

Usage:

>>> xs = [0, 2, 4, 6, 8, 10]

>>> for x in listslice(xs, 2, 4):
...     print(x)
4
6

回复收藏 0 原文

三生一梦 2024-10-26 07:28:50

一般来说，列表切片是最好的选择。

以下是一个快速的性能比较：

from timeit import timeit
from itertools import islice

for size in (10**4, 10**5, 10**6):
    L = list(range(size))
    S = size // 2
    def sum_slice(): return sum(L[S:])
    def sum_islice(): return sum(islice(L, S, None))
    def sum_for(): return sum(L[i] for i in range(S, len(L)))

    assert sum_slice() == sum_islice()
    assert sum_slice() == sum_for()

    for method in (sum_slice, sum_islice, sum_for):
        print(f'Size={size}, method={method.__name__}, time={timeit(method, number=1000)} ms')

结果：

Size=10000,   method=sum_slice,  time=0.0298 ms
Size=10000,   method=sum_islice, time=0.0449 ms
Size=10000,   method=sum_for,    time=0.2500 ms
Size=100000,  method=sum_slice,  time=0.3262 ms
Size=100000,  method=sum_islice, time=0.4492 ms
Size=100000,  method=sum_for,    time=2.4849 ms
Size=1000000, method=sum_slice,  time=5.4092 ms
Size=1000000, method=sum_islice, time=5.1139 ms
Size=1000000, method=sum_for,    time=26.198 ms

Generally, list slicing is the best option.

Here is a quick performance comparison:

from timeit import timeit
from itertools import islice

for size in (10**4, 10**5, 10**6):
    L = list(range(size))
    S = size // 2
    def sum_slice(): return sum(L[S:])
    def sum_islice(): return sum(islice(L, S, None))
    def sum_for(): return sum(L[i] for i in range(S, len(L)))

    assert sum_slice() == sum_islice()
    assert sum_slice() == sum_for()

    for method in (sum_slice, sum_islice, sum_for):
        print(f'Size={size}, method={method.__name__}, time={timeit(method, number=1000)} ms')

Results:

Size=10000,   method=sum_slice,  time=0.0298 ms
Size=10000,   method=sum_islice, time=0.0449 ms
Size=10000,   method=sum_for,    time=0.2500 ms
Size=100000,  method=sum_slice,  time=0.3262 ms
Size=100000,  method=sum_islice, time=0.4492 ms
Size=100000,  method=sum_for,    time=2.4849 ms
Size=1000000, method=sum_slice,  time=5.4092 ms
Size=1000000, method=sum_islice, time=5.1139 ms
Size=1000000, method=sum_for,    time=26.198 ms

回复收藏 0 原文