从列表/队列中删除一些项目的快速方法
这是类似 的后续内容问题询问了最好的写作方式
for item in somelist:
if determine(item):
code_to_remove_item
,似乎共识是这样的:
somelist[:] = [x for x in somelist if not determine(x)]
但是,我认为如果你只删除一些项目,大多数项目都会被复制到同一个对象中,也许这就是慢的。在另一个相关的答案中问题,有人建议:
for item in reversed(somelist):
if determine(item):
somelist.remove(item)
但是,这里< code>list.remove 将搜索该项目,该项目的列表长度为 O(N)。可能我们受到限制,因为列表表示为数组,而不是链接列表,因此删除项目将需要移动其后面的所有内容。不过,建议此处将collections.dequeue 表示为双向链表。然后应该可以在迭代时以 O(1) 的时间进行删除。我们实际上如何实现这一目标?
更新: 我也做了一些时间测试,使用以下代码:
import timeit
setup = """
import random
random.seed(1)
b = [(random.random(),random.random()) for i in xrange(1000)]
c = []
def tokeep(x):
return (x[1]>.45) and (x[1]<.5)
"""
listcomp = """
c[:] = [x for x in b if tokeep(x)]
"""
filt = """
c = filter(tokeep, b)
"""
print "list comp = ", timeit.timeit(listcomp,setup, number = 10000)
print "filtering = ", timeit.timeit(filt,setup, number = 10000)
并得到:
list comp = 4.01255393028
filtering = 3.59962391853
This is a follow up to a similar question which asked the best way to write
for item in somelist:
if determine(item):
code_to_remove_item
and it seems the consensus was on something like
somelist[:] = [x for x in somelist if not determine(x)]
However, I think if you are only removing a few items, most of the items are being copied into the same object, and perhaps that is slow. In an answer to another related question, someone suggests:
for item in reversed(somelist):
if determine(item):
somelist.remove(item)
However, here the list.remove
will search for the item, which is O(N) in the length of the list. May be we are limited in that the list is represented as an array, rather than a linked list, so removing items will need to move everything after it. However, it is suggested here that collections.dequeue is represented as a doubly linked list. It should then be possible to remove in O(1) while iterating. How would we actually accomplish this?
Update:
I did some time testing as well, with the following code:
import timeit
setup = """
import random
random.seed(1)
b = [(random.random(),random.random()) for i in xrange(1000)]
c = []
def tokeep(x):
return (x[1]>.45) and (x[1]<.5)
"""
listcomp = """
c[:] = [x for x in b if tokeep(x)]
"""
filt = """
c = filter(tokeep, b)
"""
print "list comp = ", timeit.timeit(listcomp,setup, number = 10000)
print "filtering = ", timeit.timeit(filt,setup, number = 10000)
and got:
list comp = 4.01255393028
filtering = 3.59962391853
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
列表推导式是渐近最优解:
它只遍历列表一次,因此运行时间为 O(n) 。由于您需要对每个对象调用确定(),因此任何算法都至少需要 O(n) 次操作。列表理解确实必须进行一些复制,但它只是复制对对象的引用,而不是复制对象本身。
在 Python 中从列表中删除项目的时间复杂度为 O(n),因此循环内任何带有删除、弹出或删除的操作都将是 O(n**2)。
此外,在 CPython 中,列表推导式比 for 循环更快。
The list comprehension is the asymptotically optimal solution:
It only makes one pass over the list, so runs in O(n) time. Since you need to call determine() on each object, any algorithm will require at least O(n) operations. The list comprehension does have to do some copying, but it's only copying references to the objects not copying the objects themselves.
Removing items from a list in Python is O(n), so anything with a remove, pop, or del inside the loop will be O(n**2).
Also, in CPython list comprehensions are faster than for loops.
如果您需要在 O(1) 中删除项目,您可以使用 HashMap
If you need to remove item in O(1) you can use HashMaps
由于
list.remove
相当于del list[list.index(x)]
,您可以这样做:但是:您不应该修改迭代列表时。它迟早会咬你。首先使用
过滤
或列表理解,然后再优化。Since
list.remove
is equivalent todel list[list.index(x)]
, you could do:But: you should not modify the list while iterating over it. It will bite you, sooner or later. Use
filter
or list comprehension first, and optimise later.双端队列针对头部和尾部移除进行了优化,而不是针对中间的任意移除。删除本身很快,但您仍然需要遍历列表到删除点。如果您要迭代整个长度,那么过滤双端队列和过滤列表(使用
filter
或推导式)之间的唯一区别是复制的开销,最坏的情况是常量倍数;它仍然是一个 O(n) 操作。另请注意,列表中的对象并未被复制,仅复制对它们的引用。所以开销并没有那么大。您可能可以避免像这样进行复制,但我没有特别的理由相信这比直接的列表理解更快 - 它可能不是:
A deque is optimized for head and tail removal, not for arbitrary removal in the middle. The removal itself is fast, but you still have to traverse the list to the removal point. If you're iterating through the entire length, then the only difference between filtering a deque and filtering a list (using
filter
or a comprehension) is the overhead of copying, which at worst is a constant multiple; it's still a O(n) operation. Also, note that the objects in the list aren't being copied -- just the references to them. So it's not that much overhead.It's possible that you could avoid copying like so, but I have no particular reason to believe this is faster than a straightforward list comprehension -- it's probably not:
我对此进行了尝试。我的解决方案速度较慢,但需要较少的内存开销(即不创建新数组)。在某些情况下甚至可能更快!
此代码自首次发布以来已被编辑
我在使用 timeit 时遇到了问题,我可能做错了。
这是我的结果,
我一定是 psyco 做错了什么,因为它实际上运行得更慢。
I took a stab at this. My solution is slower, but requires less memory overhead (i.e. doesn't create a new array). It might even be faster in some circumstances!
This code has been edited since its first posting
I had problems with timeit, I might be doing this wrong.
And here are the results
I must be doing something wrong with psyco, because it is actually running slower.
元素不会被列表理解复制,
这花了我一段时间才弄清楚。请参阅下面的示例代码,自行尝试不同的方法
代码
您可以指定复制列表元素所需的时间以及评估所需的时间。事实证明,复制时间与列表理解无关。
输出
elements are not copied by list comprehension
this took me a while to figure out. See the example code below, to experiment yourself with different approaches
code
You can specify how long a list element takes to copy and how long it takes to evaluate. The time to copy is irrelevant for list comprehension, as it turned out.
output
而不是检查元素是否存在。使用“尝试除外”。
我猜这更快
INSTEAD OF CHECKING IF ELEMENT IS THERE. USING TRY EXCEPT.
I GUESS THIS FASTER