从列表/队列中删除一些项目的快速方法

发布于 2024-11-02 20:57:44 字数 1613 浏览 1 评论 0原文

这是类似 的后续内容问题询问了最好的写作方式

for item in somelist:
    if determine(item):
         code_to_remove_item

,似乎共识是这样的:

somelist[:] = [x for x in somelist if not determine(x)]

但是,我认为如果你只删除一些项目,大多数项目都会被复制到同一个对象中,也许这就是慢的。在另一个相关的答案问题,有人建议:

for item in reversed(somelist):
    if determine(item):
        somelist.remove(item)

但是,这里< code>list.remove 将搜索该项目,该项目的列表长度为 O(N)。可能我们受到限制,因为列表表示为数组,而不是链接列表,因此删除项目将需要移动其后面的所有内容。不过,建议此处将collections.dequeue 表示为双向链表。然后应该可以在迭代时以 O(1) 的时间进行删除。我们实际上如何实现这一目标?

更新: 我也做了一些时间测试,使用以下代码:

import timeit
setup = """
import random
random.seed(1)
b = [(random.random(),random.random()) for i in xrange(1000)]
c = []
def tokeep(x):
        return (x[1]>.45) and (x[1]<.5)
"""
listcomp = """
c[:] = [x for x in b if tokeep(x)]
"""
filt = """
c = filter(tokeep, b)
"""
print "list comp = ", timeit.timeit(listcomp,setup, number = 10000)
print "filtering = ", timeit.timeit(filt,setup, number = 10000)

并得到:

list comp =  4.01255393028
filtering =  3.59962391853

This is a follow up to a similar question which asked the best way to write

for item in somelist:
    if determine(item):
         code_to_remove_item

and it seems the consensus was on something like

somelist[:] = [x for x in somelist if not determine(x)]

However, I think if you are only removing a few items, most of the items are being copied into the same object, and perhaps that is slow. In an answer to another related question, someone suggests:

for item in reversed(somelist):
    if determine(item):
        somelist.remove(item)

However, here the list.remove will search for the item, which is O(N) in the length of the list. May be we are limited in that the list is represented as an array, rather than a linked list, so removing items will need to move everything after it. However, it is suggested here that collections.dequeue is represented as a doubly linked list. It should then be possible to remove in O(1) while iterating. How would we actually accomplish this?

Update:
I did some time testing as well, with the following code:

import timeit
setup = """
import random
random.seed(1)
b = [(random.random(),random.random()) for i in xrange(1000)]
c = []
def tokeep(x):
        return (x[1]>.45) and (x[1]<.5)
"""
listcomp = """
c[:] = [x for x in b if tokeep(x)]
"""
filt = """
c = filter(tokeep, b)
"""
print "list comp = ", timeit.timeit(listcomp,setup, number = 10000)
print "filtering = ", timeit.timeit(filt,setup, number = 10000)

and got:

list comp =  4.01255393028
filtering =  3.59962391853

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

和我恋爱吧 2024-11-09 20:57:44

列表推导式是渐近最优解:

somelist = [x for x in somelist if not determine(x)]

它只遍历列表一次,因此运行时间为 O(n) 。由于您需要对每个对象调用确定(),因此任何算法都至少需要 O(n) 次操作。列表理解确实必须进行一些复制,但它只是复制对对象的引用,而不是复制对象本身。

在 Python 中从列表中删除项目的时间复杂度为 O(n),因此循环内任何带有删除、弹出或删除的操作都将是 O(n**2)。

此外,在 CPython 中,列表推导式比 for 循环更快。

The list comprehension is the asymptotically optimal solution:

somelist = [x for x in somelist if not determine(x)]

It only makes one pass over the list, so runs in O(n) time. Since you need to call determine() on each object, any algorithm will require at least O(n) operations. The list comprehension does have to do some copying, but it's only copying references to the objects not copying the objects themselves.

Removing items from a list in Python is O(n), so anything with a remove, pop, or del inside the loop will be O(n**2).

Also, in CPython list comprehensions are faster than for loops.

烦人精 2024-11-09 20:57:44

如果您需要在 O(1) 中删除项目,您可以使用 HashMap

If you need to remove item in O(1) you can use HashMaps

如果没有 2024-11-09 20:57:44

由于 list.remove 相当于 del list[list.index(x)],您可以这样做:

for idx, item in enumerate(somelist):
    if determine(item):
        del somelist[idx]

但是:您不应该修改迭代列表时。它迟早会咬你。首先使用过滤或列表理解,然后再优化。

Since list.remove is equivalent to del list[list.index(x)], you could do:

for idx, item in enumerate(somelist):
    if determine(item):
        del somelist[idx]

But: you should not modify the list while iterating over it. It will bite you, sooner or later. Use filter or list comprehension first, and optimise later.

你与清晨阳光 2024-11-09 20:57:44

双端队列针对头部和尾部移除进行了优化,而不是针对中间的任意移除。删除本身很快,但您仍然需要遍历列表到删除点。如果您要迭代整个长度,那么过滤双端队列和过滤列表(使用 filter 或推导式)之间的唯一区别是复制的开销,最坏的情况是常量倍数;它仍然是一个 O(n) 操作。另请注意,列表中的对象并未被复制,仅复制对它们的引用。所以开销并没有那么大。

您可能可以避免像这样进行复制,但我没有特别的理由相信这比直接的列表理解更快 - 它可能不是:

write_i = 0
for read_i in range(len(L)):
    L[write_i] = L[read_i]
    if L[read_i] not in ['a', 'c']:
         write_i += 1
del L[write_i:]

A deque is optimized for head and tail removal, not for arbitrary removal in the middle. The removal itself is fast, but you still have to traverse the list to the removal point. If you're iterating through the entire length, then the only difference between filtering a deque and filtering a list (using filter or a comprehension) is the overhead of copying, which at worst is a constant multiple; it's still a O(n) operation. Also, note that the objects in the list aren't being copied -- just the references to them. So it's not that much overhead.

It's possible that you could avoid copying like so, but I have no particular reason to believe this is faster than a straightforward list comprehension -- it's probably not:

write_i = 0
for read_i in range(len(L)):
    L[write_i] = L[read_i]
    if L[read_i] not in ['a', 'c']:
         write_i += 1
del L[write_i:]
坐在坟头思考人生 2024-11-09 20:57:44

我对此进行了尝试。我的解决方案速度较慢,但​​需要较少的内存开销(即不创建新数组)。在某些情况下甚至可能更快!

此代码自首次发布以来已被编辑

我在使用 timeit 时遇到了问题,我可能做错了。

import timeit
setup = """

import random
random.seed(1)
global b
setup_b = [(random.random(), random.random()) for i in xrange(1000)]
c = []
def tokeep(x):
        return (x[1]>.45) and (x[1]<.5)


# define and call to turn into psyco bytecode (if using psyco)
b = setup_b[:]
def listcomp():
   c[:] = [x for x in b if tokeep(x)]
listcomp()

b = setup_b[:]
def filt():
   c = filter(tokeep, b)
filt()

b = setup_b[:]
def forfilt():
   marked = (i for i, x in enumerate(b) if tokeep(x))
   shift = 0
   for n in marked:
      del b[n - shift]
      shift += 1
forfilt()

b = setup_b[:]
def forfiltCheating():
   marked = (i for i, x in enumerate(b) if (x[1] > .45) and (x[1] < .5))

   shift = 0
   for n in marked:
      del b[n - shift]
      shift += 1
forfiltCheating()

"""

listcomp = """
b = setup_b[:]

listcomp()
"""

filt = """
b = setup_b[:]

filt()
"""

forfilt = """
b = setup_b[:]

forfilt()
"""

forfiltCheating = '''
b = setup_b[:]

forfiltCheating()
'''

psycosetup = '''

import psyco
psyco.full()


'''

print "list comp = ", timeit.timeit(listcomp, setup, number = 10000)
print "filtering = ", timeit.timeit(filt, setup, number = 10000)
print 'forfilter = ', timeit.timeit(forfilt, setup, number = 10000)
print 'forfiltCheating = ', timeit.timeit(forfiltCheating, setup, number = 10000)


print '\nnow with psyco \n'
print "list comp = ", timeit.timeit(listcomp, psycosetup + setup, number = 10000)
print "filtering = ", timeit.timeit(filt, psycosetup + setup, number = 10000)
print 'forfilter = ', timeit.timeit(forfilt, psycosetup + setup, number = 10000)
print 'forfiltCheating = ', timeit.timeit(forfiltCheating, psycosetup + setup, number = 10000)

这是我的结果,

list comp =  6.56407690048
filtering =  5.64738512039
forfilter =  7.31555104256
forfiltCheating =  4.8994679451

now with psyco 

list comp =  8.0485959053
filtering =  7.79016900063
forfilter =  9.00477004051
forfiltCheating =  4.90830993652

我一定是 psyco 做错了什么,因为它实际上运行得更慢。

I took a stab at this. My solution is slower, but requires less memory overhead (i.e. doesn't create a new array). It might even be faster in some circumstances!

This code has been edited since its first posting

I had problems with timeit, I might be doing this wrong.

import timeit
setup = """

import random
random.seed(1)
global b
setup_b = [(random.random(), random.random()) for i in xrange(1000)]
c = []
def tokeep(x):
        return (x[1]>.45) and (x[1]<.5)


# define and call to turn into psyco bytecode (if using psyco)
b = setup_b[:]
def listcomp():
   c[:] = [x for x in b if tokeep(x)]
listcomp()

b = setup_b[:]
def filt():
   c = filter(tokeep, b)
filt()

b = setup_b[:]
def forfilt():
   marked = (i for i, x in enumerate(b) if tokeep(x))
   shift = 0
   for n in marked:
      del b[n - shift]
      shift += 1
forfilt()

b = setup_b[:]
def forfiltCheating():
   marked = (i for i, x in enumerate(b) if (x[1] > .45) and (x[1] < .5))

   shift = 0
   for n in marked:
      del b[n - shift]
      shift += 1
forfiltCheating()

"""

listcomp = """
b = setup_b[:]

listcomp()
"""

filt = """
b = setup_b[:]

filt()
"""

forfilt = """
b = setup_b[:]

forfilt()
"""

forfiltCheating = '''
b = setup_b[:]

forfiltCheating()
'''

psycosetup = '''

import psyco
psyco.full()


'''

print "list comp = ", timeit.timeit(listcomp, setup, number = 10000)
print "filtering = ", timeit.timeit(filt, setup, number = 10000)
print 'forfilter = ', timeit.timeit(forfilt, setup, number = 10000)
print 'forfiltCheating = ', timeit.timeit(forfiltCheating, setup, number = 10000)


print '\nnow with psyco \n'
print "list comp = ", timeit.timeit(listcomp, psycosetup + setup, number = 10000)
print "filtering = ", timeit.timeit(filt, psycosetup + setup, number = 10000)
print 'forfilter = ', timeit.timeit(forfilt, psycosetup + setup, number = 10000)
print 'forfiltCheating = ', timeit.timeit(forfiltCheating, psycosetup + setup, number = 10000)

And here are the results

list comp =  6.56407690048
filtering =  5.64738512039
forfilter =  7.31555104256
forfiltCheating =  4.8994679451

now with psyco 

list comp =  8.0485959053
filtering =  7.79016900063
forfilter =  9.00477004051
forfiltCheating =  4.90830993652

I must be doing something wrong with psyco, because it is actually running slower.

娇妻 2024-11-09 20:57:44

元素不会被列表理解复制,

这花了我一段时间才弄清楚。请参阅下面的示例代码,自行尝试不同的方法

代码

您可以指定复制列表元素所需的时间以及评估所需的时间。事实证明,复制时间与列表理解无关。

import time
import timeit
import numpy as np

def ObjectFactory(time_eval, time_copy):
    """
    Creates a class

    Parameters
    ----------
    time_eval : float
        time to evaluate (True or False, i.e. keep in list or not) an object
    time_copy : float
        time to (shallow-) copy an object. Used by list comprehension.

    Returns
    -------
    New class with defined copy-evaluate performance

    """
    class Object:

        def __init__(self, id_, keep):
            self.id_ = id_
            self._keep = keep

        def __repr__(self):
            return f"Object({self.id_}, {self.keep})"

        @property
        def keep(self):
            time.sleep(time_eval)
            return self._keep

        def __copy__(self):  # list comprehension does not copy the object
            time.sleep(time_copy)
            return self.__class__(self.id_, self._keep)

    return Object

def remove_items_from_list_list_comprehension(lst):
    return [el for el in lst if el.keep]

def remove_items_from_list_new_list(lst):
    new_list = []
    for el in lst:
        if el.keep:
            new_list += [el]
    return new_list

def remove_items_from_list_new_list_by_ind(lst):
    new_list_inds = []
    for ee in range(len(lst)):
        if lst[ee].keep:
            new_list_inds += [ee]
    return [lst[ee] for ee in new_list_inds]

def remove_items_from_list_del_elements(lst):
    """WARNING: Modifies lst"""
    new_list_inds = []
    for ee in range(len(lst)):
        if lst[ee].keep:
            new_list_inds += [ee]
    for ind in new_list_inds[::-1]:
        if not lst[ind].keep:
            del lst[ind]

if __name__ == "__main__":
    ClassSlowCopy = ObjectFactory(time_eval=0, time_copy=0.1)
    ClassSlowEval = ObjectFactory(time_eval=1e-8, time_copy=0)

    keep_ratio = .8
    n_runs_timeit = int(1e2)
    n_elements_list = int(1e2)

    lsts_to_tests = dict(
        list_slow_copy_remove_many = [ClassSlowCopy(ii, np.random.rand() > keep_ratio) for ii in range(n_elements_list)],
        list_slow_copy_keep_many = [ClassSlowCopy(ii, np.random.rand() > keep_ratio) for ii in range(n_elements_list)],
        list_slow_eval_remove_many = [ClassSlowEval(ii, np.random.rand() > keep_ratio) for ii in range(n_elements_list)],
        list_slow_eval_keep_many = [ClassSlowEval(ii, np.random.rand() > keep_ratio) for ii in range(n_elements_list)],
    )

    for lbl, lst in lsts_to_tests.items():
        print()
        for fct in [
            remove_items_from_list_list_comprehension,
            remove_items_from_list_new_list,
            remove_items_from_list_new_list_by_ind,
            remove_items_from_list_del_elements,
        ]:

            lst_loc = lst.copy()
            t = timeit.timeit(lambda: fct(lst_loc), number=n_runs_timeit)
            print(f"{fct.__name__}, {lbl}: {t=}")


输出

remove_items_from_list_list_comprehension, list_slow_copy_remove_many: t=0.0064229519994114526
remove_items_from_list_new_list, list_slow_copy_remove_many: t=0.006507338999654166
remove_items_from_list_new_list_by_ind, list_slow_copy_remove_many: t=0.006562008995388169
remove_items_from_list_del_elements, list_slow_copy_remove_many: t=0.0076057760015828535

remove_items_from_list_list_comprehension, list_slow_copy_keep_many: t=0.006243691001145635
remove_items_from_list_new_list, list_slow_copy_keep_many: t=0.007145451003452763
remove_items_from_list_new_list_by_ind, list_slow_copy_keep_many: t=0.007032064997474663
remove_items_from_list_del_elements, list_slow_copy_keep_many: t=0.007690364996960852

remove_items_from_list_list_comprehension, list_slow_eval_remove_many: t=1.2495998149970546
remove_items_from_list_new_list, list_slow_eval_remove_many: t=1.1657221479981672
remove_items_from_list_new_list_by_ind, list_slow_eval_remove_many: t=1.2621939050004585
remove_items_from_list_del_elements, list_slow_eval_remove_many: t=1.4632593330024974

remove_items_from_list_list_comprehension, list_slow_eval_keep_many: t=1.1344162709938246
remove_items_from_list_new_list, list_slow_eval_keep_many: t=1.1323430630000075
remove_items_from_list_new_list_by_ind, list_slow_eval_keep_many: t=1.1354237199993804
remove_items_from_list_del_elements, list_slow_eval_keep_many: t=1.3084568729973398

elements are not copied by list comprehension

this took me a while to figure out. See the example code below, to experiment yourself with different approaches

code

You can specify how long a list element takes to copy and how long it takes to evaluate. The time to copy is irrelevant for list comprehension, as it turned out.

import time
import timeit
import numpy as np

def ObjectFactory(time_eval, time_copy):
    """
    Creates a class

    Parameters
    ----------
    time_eval : float
        time to evaluate (True or False, i.e. keep in list or not) an object
    time_copy : float
        time to (shallow-) copy an object. Used by list comprehension.

    Returns
    -------
    New class with defined copy-evaluate performance

    """
    class Object:

        def __init__(self, id_, keep):
            self.id_ = id_
            self._keep = keep

        def __repr__(self):
            return f"Object({self.id_}, {self.keep})"

        @property
        def keep(self):
            time.sleep(time_eval)
            return self._keep

        def __copy__(self):  # list comprehension does not copy the object
            time.sleep(time_copy)
            return self.__class__(self.id_, self._keep)

    return Object

def remove_items_from_list_list_comprehension(lst):
    return [el for el in lst if el.keep]

def remove_items_from_list_new_list(lst):
    new_list = []
    for el in lst:
        if el.keep:
            new_list += [el]
    return new_list

def remove_items_from_list_new_list_by_ind(lst):
    new_list_inds = []
    for ee in range(len(lst)):
        if lst[ee].keep:
            new_list_inds += [ee]
    return [lst[ee] for ee in new_list_inds]

def remove_items_from_list_del_elements(lst):
    """WARNING: Modifies lst"""
    new_list_inds = []
    for ee in range(len(lst)):
        if lst[ee].keep:
            new_list_inds += [ee]
    for ind in new_list_inds[::-1]:
        if not lst[ind].keep:
            del lst[ind]

if __name__ == "__main__":
    ClassSlowCopy = ObjectFactory(time_eval=0, time_copy=0.1)
    ClassSlowEval = ObjectFactory(time_eval=1e-8, time_copy=0)

    keep_ratio = .8
    n_runs_timeit = int(1e2)
    n_elements_list = int(1e2)

    lsts_to_tests = dict(
        list_slow_copy_remove_many = [ClassSlowCopy(ii, np.random.rand() > keep_ratio) for ii in range(n_elements_list)],
        list_slow_copy_keep_many = [ClassSlowCopy(ii, np.random.rand() > keep_ratio) for ii in range(n_elements_list)],
        list_slow_eval_remove_many = [ClassSlowEval(ii, np.random.rand() > keep_ratio) for ii in range(n_elements_list)],
        list_slow_eval_keep_many = [ClassSlowEval(ii, np.random.rand() > keep_ratio) for ii in range(n_elements_list)],
    )

    for lbl, lst in lsts_to_tests.items():
        print()
        for fct in [
            remove_items_from_list_list_comprehension,
            remove_items_from_list_new_list,
            remove_items_from_list_new_list_by_ind,
            remove_items_from_list_del_elements,
        ]:

            lst_loc = lst.copy()
            t = timeit.timeit(lambda: fct(lst_loc), number=n_runs_timeit)
            print(f"{fct.__name__}, {lbl}: {t=}")


output

remove_items_from_list_list_comprehension, list_slow_copy_remove_many: t=0.0064229519994114526
remove_items_from_list_new_list, list_slow_copy_remove_many: t=0.006507338999654166
remove_items_from_list_new_list_by_ind, list_slow_copy_remove_many: t=0.006562008995388169
remove_items_from_list_del_elements, list_slow_copy_remove_many: t=0.0076057760015828535

remove_items_from_list_list_comprehension, list_slow_copy_keep_many: t=0.006243691001145635
remove_items_from_list_new_list, list_slow_copy_keep_many: t=0.007145451003452763
remove_items_from_list_new_list_by_ind, list_slow_copy_keep_many: t=0.007032064997474663
remove_items_from_list_del_elements, list_slow_copy_keep_many: t=0.007690364996960852

remove_items_from_list_list_comprehension, list_slow_eval_remove_many: t=1.2495998149970546
remove_items_from_list_new_list, list_slow_eval_remove_many: t=1.1657221479981672
remove_items_from_list_new_list_by_ind, list_slow_eval_remove_many: t=1.2621939050004585
remove_items_from_list_del_elements, list_slow_eval_remove_many: t=1.4632593330024974

remove_items_from_list_list_comprehension, list_slow_eval_keep_many: t=1.1344162709938246
remove_items_from_list_new_list, list_slow_eval_keep_many: t=1.1323430630000075
remove_items_from_list_new_list_by_ind, list_slow_eval_keep_many: t=1.1354237199993804
remove_items_from_list_del_elements, list_slow_eval_keep_many: t=1.3084568729973398
回眸一笑 2024-11-09 20:57:44
 import collections
 list1=collections.deque(list1)
 for i in list2:
   try:
     list1.remove(i)
   except:
     pass

而不是检查元素是否存在。使用“尝试除外”。
我猜这更快

 import collections
 list1=collections.deque(list1)
 for i in list2:
   try:
     list1.remove(i)
   except:
     pass

INSTEAD OF CHECKING IF ELEMENT IS THERE. USING TRY EXCEPT.
I GUESS THIS FASTER

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文