耗尽列表列表的最佳方法
我有一个由2D数组构建的列表:
我需要按顺序从每个列表中popleft一个值,直到所有列表都用尽。例如:
4,3,4,9,2,82,5,4,23,3,56,7 for the lists above
由于此列表可能会变得巨大,所以我想跳过空排队。例如,
我的解决方案还要有一系列排队的索引,这些索引仍然具有像这样的循环的值:
Loop continuously until we no longer have any dequeues in our queue of indexes to pop from. This will leave the entire list with an empty deque when its done. The alternative is to simply iterate through the list of deque, and remove[index_of_empty_queue] when it is done. This is extremely slow to delete items in a very large list, especially towards the start of the list.
对我的方法以及是否有更好的想法?我知道Popleft和Append是Deque的O(1),我仍然不知道使用此方法基本上通过列表迭代的总体效果,并且还可以删除“列表”中的项目(Deque)(Deque)也有o(1)。
I have a list of lists, built from a 2d array:
I need to popleft a value from each list, in order, until all lists have been exhausted. For example:
4,3,4,9,2,82,5,4,23,3,56,7 for the lists above
because this list can become enormous, I want to skip over empty queues. e.g
my solution is to also have a deque of indexes of the queues that still have values with a loop like this:
Loop continuously until we no longer have any dequeues in our queue of indexes to pop from. This will leave the entire list with an empty deque when its done. The alternative is to simply iterate through the list of deque, and remove[index_of_empty_queue] when it is done. This is extremely slow to delete items in a very large list, especially towards the start of the list.
Any thoughts on my approach and if there is a better one? I know popleft and append is O(1) for deque, I just still don't know the overall performant implications of using this method to essentially iterate through a list, with the ability to also remove items in the "list" (deque) with O(1) as well.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
带有一些解决方案的基准:
输入为1000个列表,其随机长度从1000到2000
queues_clean
是相同的,但是没有索引,而是正常的真实价值测试而不是长度检查。queues_optimized
是queues_clean
的优化版本。迭代器
就像queues_optimized
,但使用迭代器而不是队列。iterators2
和iterators3
是我尝试过的迭代器的其他一些版本,用其他东西代替了外脱口水。列
是一种不同的方法。将输入数据视为行。您想要的是串联列。因此,为每个所需列准备一个列表,然后在列中分布每个输入行。通过列收集来完成。Chained_removers
主要是zip
s所有列表。但是,它几乎没有将迭代迭代的束缚,从而删除了他们疲惫的迭代器并产生标记,然后将其删除(从当前的“列”的值中)。还为其双关联列表使用OrderDict
,允许O(1)时间删除和随后的O(长度)时间迭代。使用_ROUNDROBIN
使用roundrobin 。不确定它会以可能非常高的成本“跳过”筋疲力尽的迭代器,请参见下文。sike_interleave_longest
就像,但针对生产列表进行了优化。 都用尽了内部列表,但我出于好奇而将其包括在基准中。我最初丢弃了Roundrobin解决方案,因为您的问题使您的内心列表看起来很短(甚至是空)。 And there it's terrible, for example for 10000 lists with random lengths from 1 to 5:
Full code (
Benchmark with some solutions:
Input was 1000 lists with random lengths from 1000 to 2000.
queues
is your original solution (edit: which you had in your question but now deleted).queues_clean
is the same, but without indices, and normal truth value tests instead of length checks.queues_optimized
is an optimized version ofqueues_clean
.iterators
is likequeues_optimized
but with iterators instead of queues.iterators2
anditerators3
are some other versions with iterators I tried, replacing the outer deque with something else.columns
is a different approach. Think of the input data as rows. What you want is the concatenated columns. So prepare one list for each needed column, and then spread every input row across the columns. Finish by collecting by columns.chained_removers
mainlyzip
s all the lists. But it chains little remover iterables behind them, which remove their exhausted iterator and yield a marker which then also gets removed (from the values of the current "column"). Also uses anOrderedDict
for its doubly-linked list, allowing O(1) time deletions and subsequent O(length) time iteration.with_roundrobin
uses the roundrobin itertools recipe. Not sure it counts, as it "skips" exhausted iterators at a potentially very high cost, see below.like_interleave_longest
is like more_itertools.interleave_longest, but optimized for producing a list. It doesn't skip exhausted inner lists, but I include it in the benchmark out of curiousity.I had originally discarded the roundrobin solution because your question made it look like you had many very short (even empty) inner lists. And there it's terrible, for example for 10000 lists with random lengths from 1 to 5:
Full code (Try it online!):