耗尽列表列表的最佳方法

发布于 2025-01-31 13:47:00 字数 736 浏览 4 评论 0原文

我有一个由2D数组构建的列表：

我需要按顺序从每个列表中popleft一个值，直到所有列表都用尽。例如：

   4,3,4,9,2,82,5,4,23,3,56,7 for the lists above

由于此列表可能会变得巨大，所以我想跳过空排队。例如，

我的解决方案还要有一系列排队的索引，这些索引仍然具有像这样的循环的值：

 Loop continuously until we no longer have any dequeues in our queue of indexes to pop from. This will leave the entire list with an empty deque when its done. The alternative is to simply iterate through the list of deque, and remove[index_of_empty_queue] when it is done. This is extremely slow to delete items in a very large list, especially towards the start of the list.

对我的方法以及是否有更好的想法？我知道Popleft和Append是Deque的O（1），我仍然不知道使用此方法基本上通过列表迭代的总体效果，并且还可以删除“列表”中的项目（Deque）（Deque）也有o（1）。

原文

I have a list of lists, built from a 2d array:

I need to popleft a value from each list, in order, until all lists have been exhausted. For example:

   4,3,4,9,2,82,5,4,23,3,56,7 for the lists above

because this list can become enormous, I want to skip over empty queues. e.g

my solution is to also have a deque of indexes of the queues that still have values with a loop like this:

 Loop continuously until we no longer have any dequeues in our queue of indexes to pop from. This will leave the entire list with an empty deque when its done. The alternative is to simply iterate through the list of deque, and remove[index_of_empty_queue] when it is done. This is extremely slow to delete items in a very large list, especially towards the start of the list.

Any thoughts on my approach and if there is a better one? I know popleft and append is O(1) for deque, I just still don't know the overall performant implications of using this method to essentially iterate through a list, with the ability to also remove items in the "list" (deque) with O(1) as well.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

屋檐 2025-02-07 13:47:00

带有一些解决方案的基准：

Best three from ten runs:
 145 ms   147 ms   148 ms  columns
 202 ms   203 ms   204 ms  chained_removers
 219 ms   220 ms   221 ms  like_interleave_longest
 302 ms   303 ms   304 ms  with_roundrobin
 313 ms   314 ms   314 ms  iterators
 330 ms   331 ms   333 ms  iterators3
 336 ms   338 ms   338 ms  iterators2
 366 ms   368 ms   369 ms  queues_optimized
 471 ms   474 ms   475 ms  queues_clean
 737 ms   741 ms   746 ms  queues

输入为1000个列表，其随机长度从1000到2000

。
queues_clean是相同的，但是没有索引，而是正常的真实价值测试而不是长度检查。
queues_optimized是queues_clean的优化版本。
迭代器就像queues_optimized，但使用迭代器而不是队列。
iterators2和iterators3是我尝试过的迭代器的其他一些版本，用其他东西代替了外脱口水。
列是一种不同的方法。将输入数据视为行。您想要的是串联列。因此，为每个所需列准备一个列表，然后在列中分布每个输入行。通过列收集来完成。
Chained_removers主要是zip s所有列表。但是，它几乎没有将迭代迭代的束缚，从而删除了他们疲惫的迭代器并产生标记，然后将其删除（从当前的“列”的值中）。还为其双关联列表使用OrderDict，允许O（1）时间删除和随后的O（长度）时间迭代。
使用_ROUNDROBIN使用roundrobin 。不确定它会以可能非常高的成本“跳过”筋疲力尽的迭代器，请参见下文。
sike_interleave_longest就像，但针对生产列表进行了优化。都用尽了内部列表，但我出于好奇而将其包括在基准中。

我最初丢弃了Roundrobin解决方案，因为您的问题使您的内心列表看起来很短（甚至是空）。 And there it's terrible, for example for 10000 lists with random lengths from 1 to 5:

   3 ms     3 ms     3 ms  like_interleave_longest
   5 ms     6 ms     6 ms  columns
   8 ms     8 ms     8 ms  iterators
   8 ms     8 ms     8 ms  iterators2
   8 ms     8 ms     8 ms  iterators3
   9 ms     9 ms    10 ms  queues_optimized
  12 ms    12 ms    13 ms  queues_clean
  18 ms    18 ms    19 ms  queues
  26 ms    27 ms    29 ms  chained_removers
3642 ms  3750 ms  3812 ms  with_roundrobin

Full code (

def queues(data):
    data_q = [deque(i) for i in data ]
    data_i = deque([i for i in range(len(data_q))])
    return_list = []
    while len(data_i) > 0:
        index = data_i.popleft()
        return_list.append(data_q[index].popleft())
        if len(data_q[index]) != 0:
            data_i.append(index)
    return return_list

def queues_clean(data):
    queues = deque(map(deque, data))
    result = []
    while queues:
        queue = queues.popleft()
        result.append(queue.popleft())
        if queue:
            queues.append(queue)
    return result

def queues_optimized(data):
    queues = deque(map(deque, data))
    queues_pop = queues.popleft
    queues_push = queues.append
    result = []
    result_append = result.append
    while queues:
        queue = queues_pop()
        result_append(queue.popleft())
        if queue:
            queues_push(queue)
    return result

def iterators(data):
    iterators = deque(map(iter, data))
    iterators_pop = iterators.popleft
    iterators_push = iterators.append
    result = []
    result_append = result.append
    next_value = next
    while iterators:
        iterator = iterators_pop()
        try:
            result_append(next_value(iterator))
            iterators_push(iterator)
        except StopIteration:
            pass
    return result

def iterators2(data):
    iterators = list(map(iter, data))
    result = []
    result_append = result.append
    next_value = next
    while iterators:
        alive = []
        keep = alive.append
        for iterator in iterators:
            try:
                result_append(next_value(iterator))
                keep(iterator)
            except StopIteration:
                pass
        iterators = alive
    return result

def iterators3(data):
    iterators = list(map(iter, data))
    result = []
    result_append = result.append
    next_value = next
    while iterators:
        keep = 0
        for iterator in iterators:
            try:
                result_append(next_value(iterator))
                iterators[keep] = iterator
                keep += 1
            except StopIteration:
                pass
        del iterators[keep:]
    return result

def columns(data):
    columns = [[] for _ in range(max(map(len, data)))]
    for row in data:
        deque(map(list.append, columns, row), 0)
    result = []
    for column in columns:
        result += column
    return result

def chained_removers(data):
    marker = object()
    def remover(i):
        del iterators[i]
        yield marker
    iterators = OrderedDict()
    for i, d in enumerate(data):
        iterators[i] = chain(d, remover(i))
    result = []
    while alive := len(iterators):
        for values in zip(*iterators.values()):
            if len(iterators) < alive:
                result += compress(values, map(is_not, values, repeat(marker)))
                break
            result += values
    return result

def roundrobin(*iterables):
    "roundrobin('ABC', 'D', 'EF') --> A D E B F C"
    # Recipe credited to George Sakkis
    num_active = len(iterables)
    nexts = cycle(iter(it).__next__ for it in iterables)
    while num_active:
        try:
            for next in nexts:
                yield next()
        except StopIteration:
            # Remove the iterator we just exhausted from the cycle.
            num_active -= 1
            nexts = cycle(islice(nexts, num_active))

def with_roundrobin(data):
    return list(roundrobin(*data))

def like_interleave_longest(data):
    marker = object()
    return [x
            for xs in zip_longest(*data, fillvalue=marker)
            for x in xs
            if x is not marker]

funcs = [
    queues,
    queues_clean,
    queues_optimized,
    iterators,
    iterators2,
    iterators3,
    columns,
    chained_removers,
    with_roundrobin,
    like_interleave_longest,
]

from timeit import default_timer as time
from random import randint, shuffle
from bisect import insort
from collections import deque, OrderedDict
from itertools import cycle, islice, chain, compress, repeat, zip_longest
from operator import is_not

data = [list(range(1000, 1000 + randint(1, 5)))
        for _ in range(10000)]

data = [list(range(1000, 1000 + randint(1000, 2000)))
        for _ in range(1000)]

expect = funcs[0](data)
for func in funcs:
    assert func(data) == expect

times = {func: [] for func in funcs}
for _ in range(10):
    shuffle(funcs)
    for func in funcs:
        t0 = time()
        func(data)
        t1 = time()
        insort(times[func], t1 - t0)
for func in sorted(funcs, key=times.get):
    print(*('%4d ms ' % (t * 1e3) for t in times[func][:3]), func.__name__)

Benchmark with some solutions:

Best three from ten runs:
 145 ms   147 ms   148 ms  columns
 202 ms   203 ms   204 ms  chained_removers
 219 ms   220 ms   221 ms  like_interleave_longest
 302 ms   303 ms   304 ms  with_roundrobin
 313 ms   314 ms   314 ms  iterators
 330 ms   331 ms   333 ms  iterators3
 336 ms   338 ms   338 ms  iterators2
 366 ms   368 ms   369 ms  queues_optimized
 471 ms   474 ms   475 ms  queues_clean
 737 ms   741 ms   746 ms  queues

Input was 1000 lists with random lengths from 1000 to 2000.

queues is your original solution (edit: which you had in your question but now deleted).
queues_clean is the same, but without indices, and normal truth value tests instead of length checks.
queues_optimized is an optimized version of queues_clean.
iterators is like queues_optimized but with iterators instead of queues.
iterators2 and iterators3 are some other versions with iterators I tried, replacing the outer deque with something else.
columns is a different approach. Think of the input data as rows. What you want is the concatenated columns. So prepare one list for each needed column, and then spread every input row across the columns. Finish by collecting by columns.
chained_removers mainly zips all the lists. But it chains little remover iterables behind them, which remove their exhausted iterator and yield a marker which then also gets removed (from the values of the current "column"). Also uses an OrderedDict for its doubly-linked list, allowing O(1) time deletions and subsequent O(length) time iteration.
with_roundrobin uses the roundrobin itertools recipe. Not sure it counts, as it "skips" exhausted iterators at a potentially very high cost, see below.
like_interleave_longest is like more_itertools.interleave_longest, but optimized for producing a list. It doesn't skip exhausted inner lists, but I include it in the benchmark out of curiousity.

I had originally discarded the roundrobin solution because your question made it look like you had many very short (even empty) inner lists. And there it's terrible, for example for 10000 lists with random lengths from 1 to 5:

   3 ms     3 ms     3 ms  like_interleave_longest
   5 ms     6 ms     6 ms  columns
   8 ms     8 ms     8 ms  iterators
   8 ms     8 ms     8 ms  iterators2
   8 ms     8 ms     8 ms  iterators3
   9 ms     9 ms    10 ms  queues_optimized
  12 ms    12 ms    13 ms  queues_clean
  18 ms    18 ms    19 ms  queues
  26 ms    27 ms    29 ms  chained_removers
3642 ms  3750 ms  3812 ms  with_roundrobin

Full code (Try it online!):

def queues(data):
    data_q = [deque(i) for i in data ]
    data_i = deque([i for i in range(len(data_q))])
    return_list = []
    while len(data_i) > 0:
        index = data_i.popleft()
        return_list.append(data_q[index].popleft())
        if len(data_q[index]) != 0:
            data_i.append(index)
    return return_list

def queues_clean(data):
    queues = deque(map(deque, data))
    result = []
    while queues:
        queue = queues.popleft()
        result.append(queue.popleft())
        if queue:
            queues.append(queue)
    return result

def queues_optimized(data):
    queues = deque(map(deque, data))
    queues_pop = queues.popleft
    queues_push = queues.append
    result = []
    result_append = result.append
    while queues:
        queue = queues_pop()
        result_append(queue.popleft())
        if queue:
            queues_push(queue)
    return result

def iterators(data):
    iterators = deque(map(iter, data))
    iterators_pop = iterators.popleft
    iterators_push = iterators.append
    result = []
    result_append = result.append
    next_value = next
    while iterators:
        iterator = iterators_pop()
        try:
            result_append(next_value(iterator))
            iterators_push(iterator)
        except StopIteration:
            pass
    return result

def iterators2(data):
    iterators = list(map(iter, data))
    result = []
    result_append = result.append
    next_value = next
    while iterators:
        alive = []
        keep = alive.append
        for iterator in iterators:
            try:
                result_append(next_value(iterator))
                keep(iterator)
            except StopIteration:
                pass
        iterators = alive
    return result

def iterators3(data):
    iterators = list(map(iter, data))
    result = []
    result_append = result.append
    next_value = next
    while iterators:
        keep = 0
        for iterator in iterators:
            try:
                result_append(next_value(iterator))
                iterators[keep] = iterator
                keep += 1
            except StopIteration:
                pass
        del iterators[keep:]
    return result

def columns(data):
    columns = [[] for _ in range(max(map(len, data)))]
    for row in data:
        deque(map(list.append, columns, row), 0)
    result = []
    for column in columns:
        result += column
    return result

def chained_removers(data):
    marker = object()
    def remover(i):
        del iterators[i]
        yield marker
    iterators = OrderedDict()
    for i, d in enumerate(data):
        iterators[i] = chain(d, remover(i))
    result = []
    while alive := len(iterators):
        for values in zip(*iterators.values()):
            if len(iterators) < alive:
                result += compress(values, map(is_not, values, repeat(marker)))
                break
            result += values
    return result

def roundrobin(*iterables):
    "roundrobin('ABC', 'D', 'EF') --> A D E B F C"
    # Recipe credited to George Sakkis
    num_active = len(iterables)
    nexts = cycle(iter(it).__next__ for it in iterables)
    while num_active:
        try:
            for next in nexts:
                yield next()
        except StopIteration:
            # Remove the iterator we just exhausted from the cycle.
            num_active -= 1
            nexts = cycle(islice(nexts, num_active))

def with_roundrobin(data):
    return list(roundrobin(*data))

def like_interleave_longest(data):
    marker = object()
    return [x
            for xs in zip_longest(*data, fillvalue=marker)
            for x in xs
            if x is not marker]

funcs = [
    queues,
    queues_clean,
    queues_optimized,
    iterators,
    iterators2,
    iterators3,
    columns,
    chained_removers,
    with_roundrobin,
    like_interleave_longest,
]

from timeit import default_timer as time
from random import randint, shuffle
from bisect import insort
from collections import deque, OrderedDict
from itertools import cycle, islice, chain, compress, repeat, zip_longest
from operator import is_not

data = [list(range(1000, 1000 + randint(1, 5)))
        for _ in range(10000)]

data = [list(range(1000, 1000 + randint(1000, 2000)))
        for _ in range(1000)]

expect = funcs[0](data)
for func in funcs:
    assert func(data) == expect

times = {func: [] for func in funcs}
for _ in range(10):
    shuffle(funcs)
    for func in funcs:
        t0 = time()
        func(data)
        t1 = time()
        insort(times[func], t1 - t0)
for func in sorted(funcs, key=times.get):
    print(*('%4d ms ' % (t * 1e3) for t in times[func][:3]), func.__name__)

回复收藏 0 原文

~没有更多了~