蟒蛇 & numpy:数组切片的总和
我有一维 numpy 数组 (array_
) 和一个 Python 列表 (list_)。
下面的代码可以工作,但效率低下,因为切片涉及不必要的副本(当然对于 Python 列表,我相信也对于 numpy 数组?):
result = sum(array_[1:])
result = sum(list_[1:])
重写它的好方法是什么?
I have 1-dimensional numpy array (array_
) and a Python list (list_).
The following code works, but is inefficient because slices involve an unnecessary copy (certainly for Python lists, and I believe also for numpy arrays?):
result = sum(array_[1:])
result = sum(list_[1:])
What's a good way to rewrite that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
与列表的情况不同,对 numpy 数组进行切片不会创建副本。
作为一个基本示例:
这会产生:
即使我们修改了 y 中的值,它也只是与 x 相同的内存的视图。
对 ndarray 进行切片会返回一个视图,并且不会复制内存。
然而,使用 array_[1:].sum() 比在 numpy 数组上调用 python 的内置 sum 效率更高。
作为快速比较:
编辑:
对于列表,如果由于某种原因您不想制作副本,您可以随时使用
itertools.islice
。而不是:你可以这样做:
不过,在大多数情况下,这是矫枉过正的。如果您处理的列表足够长,以至于内存管理是一个主要问题,那么您可能不应该使用列表来存储您的值。 (列表的设计目的不是为了将项目紧凑地存储在内存中。)
此外,您也不希望对 numpy 数组执行此操作。简单地执行
some_array[1:].sum()
将会快几个数量级,并且不会比islice
使用更多的内存。Slicing a numpy array doesn't make a copy, as it does in the case of a list.
As a basic example:
This yields:
Even though we modified the values in
y
, it's just a view into the same memory asx
.Slicing an ndarray returns a view and doesn't duplicate the memory.
However, it would be much more efficient to use
array_[1:].sum()
rather than calling python's builtinsum
on a numpy array.As a quick comparison:
Edit:
In the case of the list, if for some reason you don't want to make a copy, you could always use
itertools.islice
. Instead of:you could do:
In most cases this is overkill, though. If you're dealing with lists long enough that memory management is a major issue, then you probably shouldn't be using a list to store your values. (Lists are not designed or intended to store items compactly in memory.)
Also, you wouldn't want to do this for a numpy array. Simply doing
some_array[1:].sum()
will be several orders of magnitude faster and won't use any more memory thanislice
.当谈到列表时,我的第一直觉与 Joe Kington 相同,但我检查过,至少在我的机器上,
islice
始终较慢!我尝试了
custom_sum
并发现它更快,但速度也不是很多:此外,在数量较大时,它的速度要慢得多!
我想不出还有什么可以测试的。 (有什么想法吗?)
My first instinct was the same as Joe Kington's when it comes to lists, but I checked, and on my machine at least,
islice
is consistently slower!I tried a
custom_sum
and found that it was faster, but not by much:Furthermore, at larger numbers, it was slower by far!
I couldn't think of anything else to test. (Thoughts, anyone?)
@Joe Kington(这是临时答案,只是显示我的计时,我很快就会将其删除):
据我的 numpy(1.5.1) 源代码所说,
sum(.)
只是一个包装器对于x.sum(.)
。因此,对于较大的输入,sum(.)
和x.sum(.)
的执行时间是相同的(渐近)。编辑:这个答案只是一个临时答案,但实际上它(及其评论)可能确实对某人有用。所以我就让它保持原样,直到有人真正要求我删除它。
@Joe Kington (this is temporary answer to just show my timings, I'll remove it soon):
As far as my numpy(1.5.1) source tells,
sum(.)
is just a wrapper forx.sum(.)
. Thus with larger inputs execution time is same (asymptotically) forsum(.)
andx.sum(.)
.Edit: This answer was intended to be just a temporary one, but actually it (and its comments) may indeed be useful to someone. So I'll just leave it as it is just now, until someone really request me to delete it.
我没有发现
x[1:].sum()
比x.sum()
慢得多。对于列表,sum(x) - x[0]
比sum(x[1:])
更快(大约比 OMM 快 40%)。I don't find
x[1:].sum()
significantly slower thanx.sum()
. For listssum(x) - x[0]
is faster thansum(x[1:])
(about 40% faster OMM).