“排序一维迭代器”基于“2-d迭代器” (迭代器的笛卡尔积)

发布于 2024-11-05 23:14:27 字数 1743 浏览 0 评论 0原文

我正在寻找一种在 Python 中执行此操作的干净方法:

假设我有两个迭代器“iter1”和“iter2”:可能是一个素数生成器和 itertools.count()。我先验地知道两者都是无限且单调递增的。现在我想对两个参数“op”(可能是operator.add或operator.mul)进行一些简单的操作,并使用每个元素计算第一个迭代器的每个元素下一个,使用所述操作,然后一次产生一个,并排序。显然,这本身就是一个无限序列。 (正如 @RyanThompson 的评论中提到的:这将被称为这些序列的笛卡尔积。或者,更准确地说,该产品的一维排序。)

最好的方法是什么:

  • 将“iter1”、“iter2”和“op”包装在一个迭代中,该迭代本身单调地产生值增加产量。

允许的简化假设:

  • 如果有帮助的话,我们可以假设 op(a,b) >= a 且 op(a,b) >= b。
  • 如果有帮助,我们可以假设 op(a,b) > op(a,c) 对于所有 b > c.

也是允许的:

  • 同样可以接受的是一个迭代器,它以“普遍递增”的顺序产生值......我的意思是,可迭代偶尔会给我一个小于前一个数字的数字,但它会以某种方式使“辅助信息”可用(就像通过对象上的方法一样),这会说“我不保证我给你的下一个值将大于我刚刚给你的值,但我确信所有未来的值至少都会大于 N。 “......和“N”本身是单调递增的。

我能想到的唯一方法是一种“对角化”过程,其中我保留越来越多的部分处理的迭代,并“向前看”所有可能的 next() 值的最小值,并产生。但是,即使在我开始编写代码之前,这种由 heapq 和一堆双端队列组成的奇怪聚集就显得很奇怪。

请:不要将您的答案建立在我的示例提到素数或 count() 的事实之上......我对这个概念有多种用途,但与素数和 count() 无关。


更新:天哪!多么精彩的讨论啊!还有一些很棒的答案和非常详尽的解释。非常感谢。 StackOverflow 岩石;你们摇滚。

我将很快更彻底地深入研究每个答案,并为示例代码提供一些帮助。从我到目前为止所读到的内容来看,我最初的怀疑得到了证实,没有“简单的 Python 习惯用法”可以做到这一点。相反,无论如何,我无法避免无限期地保留 iter1 和 iter2 的所有生成值。

FWIW:如果您想尝试您的解决方案,这里有一个官方“测试用例”。

import operator

def powers_of_ten():
    n = 0
    while True:
        yield 10**n
        n += 1

def series_of_nines():
    yield 1
    n = 1
    while True:
        yield int("9"*n)
        n += 1

op = operator.mul
iter1 = powers_of_ten()
iter2 = series_of_nines()

# given (iter1, iter2, op), create an iterator that yields:
# [1, 9, 10, 90, 99, 100, 900, 990, 999, 1000, 9000, 9900, 9990, 9999, 10000, ...]

I'm looking for a clean way to do this in Python:

Let's say I have two iterators "iter1" and "iter2": perhaps a prime number generator, and itertools.count(). I know a priori that both are are infinite and monotonically increasing. Now I want to take some simple operation of two args "op" (perhaps operator.add or operator.mul), and calculate every element of the first iterator with every element of the next, using said operation, then yield them one at a time, sorted. Obviously, this is an infinite sequence itself. (As mentioned in comment by @RyanThompson: this would be called the Cartesian Product of these sequences... or, more exactly, the 1d-sort of that product.)

What is the best way to:

  • wrap-up "iter1", "iter2", and "op" in an iterable that itself yields the values in monotonically increasing output.

Allowable simplifying assumptions:

  • If it helps, we can assume op(a,b) >= a and op(a,b) >= b.
  • If it helps, we can assume op(a,b) > op(a,c) for all b > c.

Also allowable:

  • Also acceptable would be an iterator that yields values in "generally increasing" order... by which I mean the iterable could occasionally give me a number less than a previous one, but it would somehow make "side information" available (as via a method on the object) that would say "I'm not guarenteeing the next value I give you will be greater than the one I just gave you, but I AM SURE that all future values will at least be greater than N.".... and "N" itself is monotonically increasing.

The only way I can think to do this is a sort of "diagonalization" process, where I keep an increasing number of partially processed iterables around, and "look ahead" for the minimum of all the possible next() values, and yield that. But this weird agglomeration of a heapq and a bunch of deques just seems outlandish, even before I start to code it.

Please: do not base your answer on the fact that my examples mentioned primes or count().... I have several uses for this very concept that are NOT related to primes and count().


UPDATE: OMG! What a great discussion! And some great answers with really thorough explanations. Thanks so much. StackOverflow rocks; you guys rock.

I'm going to delve in to each answer more thoroughly soon, and give the sample code a kick in the tires. From what I've read so far, my original suspicions are confirmed that there is no "simple Python idiom" to do this. Rather, by one way or another, I can't escape keeping all yielded values of iter1 and iter2 around indefinitely.

FWIW: here's an official "test case" if you want to try your solutions.

import operator

def powers_of_ten():
    n = 0
    while True:
        yield 10**n
        n += 1

def series_of_nines():
    yield 1
    n = 1
    while True:
        yield int("9"*n)
        n += 1

op = operator.mul
iter1 = powers_of_ten()
iter2 = series_of_nines()

# given (iter1, iter2, op), create an iterator that yields:
# [1, 9, 10, 90, 99, 100, 900, 990, 999, 1000, 9000, 9900, 9990, 9999, 10000, ...]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

千年*琉璃梦 2024-11-12 23:14:27
import heapq
import itertools
import operator


def increasing(fn, left, right):
    """
    Given two never decreasing iterators produce another iterator
    resulting from passing the value from left and right to fn.
    This iterator should also be never decreasing.
    """
    # Imagine an infinite 2D-grid.
    # Each column corresponds to an entry from right
    # Each row corresponds to an entry from left
    # Each cell correspond to apply fn to those two values

    # If the number of columns were finite, then we could easily solve
    # this problem by keeping track of our current position in each column
    # in each iteration, we'd take the smallest value report it, and then
    # move down in that column. This works because the values must increase
    # as we move down the column. That means the current set of values
    # under consideration must include the lowest value not yet reported

    # To extend this to infinite columns, at any point we always track a finite
    # number of columns. The last column current tracked is always in the top row
    # if it moves down from the top row, we add a new column which starts at the top row
    # because the values are increasing as we move to the right, we know that
    # this last column is always lower then any columns that come after it





    # Due to infinities, we need to keep track of all
    # items we've ever seen. So we put them in this list
    # The list contains the first part of the incoming iterators that
    # we have explored
    left_items = [next(left)]
    right_items = [next(right)]

    # we use a heap data structure, it allows us to efficiently
    # find the lowest of all value under consideration
    heap = []

    def add_value(left_index, right_index):
        """
        Add the value result from combining the indexed attributes
        from the two iterators. Assumes that the values have already
        been copied into the lists
        """
        value = fn( left_items[left_index], right_items[right_index] )
        # the value on the heap has the index and value.
        # since the value is first, low values will be "first" on the heap
        heapq.heappush( heap, (value, left_index, right_index) )

    # we know that every other value must be larger then 
    # this one. 
    add_value(0,0)

    # I assume the incoming iterators are infinite
    while True:
        # fetch the lowest of all values under consideration
        value, left_index, right_index = heapq.heappop(heap)

        # produce it
        yield value

        # add moving down the column
        if left_index + 1 == len(left_items):
            left_items.append(next(left))

        add_value(left_index+1, right_index)

        # if this was the first row in this column, add another column
        if left_index == 0:
            right_items.append( next(right) )
            add_value(0, right_index+1)






def fib():
    a = 1
    b = 1
    while True:
        yield a
        a,b = b,a+b



r = increasing(operator.add, fib(), itertools.count() )
for x in range(100):
    print next(r)
import heapq
import itertools
import operator


def increasing(fn, left, right):
    """
    Given two never decreasing iterators produce another iterator
    resulting from passing the value from left and right to fn.
    This iterator should also be never decreasing.
    """
    # Imagine an infinite 2D-grid.
    # Each column corresponds to an entry from right
    # Each row corresponds to an entry from left
    # Each cell correspond to apply fn to those two values

    # If the number of columns were finite, then we could easily solve
    # this problem by keeping track of our current position in each column
    # in each iteration, we'd take the smallest value report it, and then
    # move down in that column. This works because the values must increase
    # as we move down the column. That means the current set of values
    # under consideration must include the lowest value not yet reported

    # To extend this to infinite columns, at any point we always track a finite
    # number of columns. The last column current tracked is always in the top row
    # if it moves down from the top row, we add a new column which starts at the top row
    # because the values are increasing as we move to the right, we know that
    # this last column is always lower then any columns that come after it





    # Due to infinities, we need to keep track of all
    # items we've ever seen. So we put them in this list
    # The list contains the first part of the incoming iterators that
    # we have explored
    left_items = [next(left)]
    right_items = [next(right)]

    # we use a heap data structure, it allows us to efficiently
    # find the lowest of all value under consideration
    heap = []

    def add_value(left_index, right_index):
        """
        Add the value result from combining the indexed attributes
        from the two iterators. Assumes that the values have already
        been copied into the lists
        """
        value = fn( left_items[left_index], right_items[right_index] )
        # the value on the heap has the index and value.
        # since the value is first, low values will be "first" on the heap
        heapq.heappush( heap, (value, left_index, right_index) )

    # we know that every other value must be larger then 
    # this one. 
    add_value(0,0)

    # I assume the incoming iterators are infinite
    while True:
        # fetch the lowest of all values under consideration
        value, left_index, right_index = heapq.heappop(heap)

        # produce it
        yield value

        # add moving down the column
        if left_index + 1 == len(left_items):
            left_items.append(next(left))

        add_value(left_index+1, right_index)

        # if this was the first row in this column, add another column
        if left_index == 0:
            right_items.append( next(right) )
            add_value(0, right_index+1)






def fib():
    a = 1
    b = 1
    while True:
        yield a
        a,b = b,a+b



r = increasing(operator.add, fib(), itertools.count() )
for x in range(100):
    print next(r)
灵芸 2024-11-12 23:14:27

将序列定义为:

a1 <= a2 <= a3 ...
b1 <= b2 <= b3 ...

a1b1 简称为 op(a1,b1)

根据您允许的假设(非常重要),您知道以下内容:

max(a1, b1) <= a1b1 <= a1b2 <= a1b3 ...
   <=
max(a2, b1) <= a2b1 <= a2b2 <= a2b3 ...
   <=
max(a3, b1) <= a3b1 <= a3b2 <= a3b3 ...
    .     .
    .      .
    .       .

您必须执行以下操作:

生成 a1b1。您知道,如果继续增加 b 变量,您只会得到更高的值。现在的问题是:增加a变量是否有更小的值?您的下限是 min(a1, b1),因此您必须增加 a 值,直到 min(ax,b1) >= a1b1代码>.一旦达到该点,您就可以从 anb1 中找到最小值,其中 1 <= n <= x 并安全地得出该值。

然后,您将拥有多个必须跟踪的水平链。每当您的值超过 min(ax,b1) 时,您就必须增加 x(添加更多链),直到 min(ax, b1) 在安全发射之前比它大。

只是一个起点......我现在没有时间编码。

编辑:哦,这正是你已经拥有的。好吧,没有更多信息,这就是您所能做的,因为我很确定从数学上讲,这就是必要的。

EDIT2:至于你的“可接受”的解决方案:你可以按照n的递增顺序产生a1bn,返回min(a1,b1)N =P。你需要更具体。你说得好像你对你通常想要看到的东西有一种启发式的感觉,你想要在两个迭代中取得进展的一般方式,但如果不告诉我们它是什么,我不知道如何做得更好。


更新:温斯顿的很好,但假设海报没有提到: op(a,c) > >如果b>a,则op(b,c)。但是,我们确实知道 op(a,b)>=aop(a,b)>=b

这是我的解决方案,它采用了第二个假设,但不是温斯顿采用的假设。不过,支持他的代码结构:

def increasing(fn, left, right):
    left_items = [next(left)]
    right_items = [next(right)]

    #columns are (column value, right index)
    columns = [(fn(left_items[0],right_items[0]),0)]

    while True:
        #find the current smallest value
        min_col_index = min(xrange(len(columns)), key=lambda i:columns[i][0])

        #generate columns until it's impossible to get a smaller value
        while right_items[0] <= columns[min_col_index][0] and \
              left_items[-1] <= columns[min_col_index][0]:
            next_left = next(left)

            left_items.append(next_left)
            columns.append((fn(next_left, right_items[0]),0))

            if columns[-1][0] < columns[min_col_index][0]:
                min_col_index = len(columns)-1

        #yield the smallest value
        yield columns[min_col_index][0]

        #move down that column
        val, right_index = columns[min_col_index]

        #make sure that right value is generated:
        while right_index+1 >= len(right_items):
            right_items.append(next(right))

        columns[min_col_index] = (fn(left_items[min_col_index],right_items[right_index+1]),
                                  right_index+1)
        #repeat                

对于演示差异的(病态)输入,请考虑:

def pathological_one():
    cur = 0
    while True:
        yield cur
        cur += 100

def pathological_two():
    cur = 0
    while True:
        yield cur
        cur += 100

lookup = [
    [1,   666, 500],
    [666, 666, 666],
    [666, 666, 666],
    [666, 666, 666]]

def pathological_op(a, b):
    if a >= 300 or b >= 400: return 1005
    return lookup[b/100][a/100]

r = increasing(pathological_op, pathological_one(), pathological_two())
for x in range(15):
    print next(r)

温斯顿的答案给出:

>>> 
1
666
666
666
666
500
666
666
666
666
666
666
1005
1005
1005

而我的答案给出:

>>> 
1
500
666
666
666
666
666
666
666
666
666
666
1005
1005
1005

Define the sequences as:

a1 <= a2 <= a3 ...
b1 <= b2 <= b3 ...

Let a1b1 mean op(a1,b1) for short.

Based on your allowable assumptions (very important) you know the following:

max(a1, b1) <= a1b1 <= a1b2 <= a1b3 ...
   <=
max(a2, b1) <= a2b1 <= a2b2 <= a2b3 ...
   <=
max(a3, b1) <= a3b1 <= a3b2 <= a3b3 ...
    .     .
    .      .
    .       .

You'll have to do something like:

Generate a1b1. You know that if you continue increasing the b variables, you will only get higher values. The question now is: is there a smaller value by increasing the a variables? Your lower bound is min(a1, b1), so you will have to increase the a values until min(ax,b1) >= a1b1. Once you hit that point, you can find the smallest value from anb1 where 1 <= n <= x and yield that safely.

You then will have multiple horizontal chains that you'll have to keep track of. Every time you have a value that goes past min(ax,b1), you'll have to increase x (adding more chains) until min(ax,b1) is larger than it before safely emitting it.

Just a starting point... I don't have time to code it at the moment.

EDIT: Oh heh that's exactly what you already had. Well, without more info, this is all you can do, as I'm pretty sure that mathematically, that's what is necessary.

EDIT2: As for your 'acceptable' solution: you can just yield a1bn in increasing order of n, returning min(a1,b1) as N =P. You'll need to be more specific. You speak as if you have a heuristic of what you generally want to see, the general way you want to progress through both iterables, but without telling us what it is I don't know how one could do better.


UPDATE: Winston's is good but makes an assumption that the poster didn't mention: that op(a,c) > op(b,c) if b>a. However, we do know that op(a,b)>=a and op(a,b)>=b.

Here is my solution which takes that second assumption but not the one Winston took. Props to him for the code structure, though:

def increasing(fn, left, right):
    left_items = [next(left)]
    right_items = [next(right)]

    #columns are (column value, right index)
    columns = [(fn(left_items[0],right_items[0]),0)]

    while True:
        #find the current smallest value
        min_col_index = min(xrange(len(columns)), key=lambda i:columns[i][0])

        #generate columns until it's impossible to get a smaller value
        while right_items[0] <= columns[min_col_index][0] and \
              left_items[-1] <= columns[min_col_index][0]:
            next_left = next(left)

            left_items.append(next_left)
            columns.append((fn(next_left, right_items[0]),0))

            if columns[-1][0] < columns[min_col_index][0]:
                min_col_index = len(columns)-1

        #yield the smallest value
        yield columns[min_col_index][0]

        #move down that column
        val, right_index = columns[min_col_index]

        #make sure that right value is generated:
        while right_index+1 >= len(right_items):
            right_items.append(next(right))

        columns[min_col_index] = (fn(left_items[min_col_index],right_items[right_index+1]),
                                  right_index+1)
        #repeat                

For a (pathological) input that demonstrates the difference, consider:

def pathological_one():
    cur = 0
    while True:
        yield cur
        cur += 100

def pathological_two():
    cur = 0
    while True:
        yield cur
        cur += 100

lookup = [
    [1,   666, 500],
    [666, 666, 666],
    [666, 666, 666],
    [666, 666, 666]]

def pathological_op(a, b):
    if a >= 300 or b >= 400: return 1005
    return lookup[b/100][a/100]

r = increasing(pathological_op, pathological_one(), pathological_two())
for x in range(15):
    print next(r)

Winston's answer gives:

>>> 
1
666
666
666
666
500
666
666
666
666
666
666
1005
1005
1005

While mine gives:

>>> 
1
500
666
666
666
666
666
666
666
666
666
666
1005
1005
1005
故人的歌 2024-11-12 23:14:27

让我从一个例子开始,说明如何直观地解决这个问题。

因为内联阅读代码有点乏味,所以我将介绍一些符号:

符号

  • i1将代表iter1i10 将表示 iter1 的第一个元素。与iter2相同。
  • ※ 将代表 op 运算符

直观的解决方案

通过使用简化的假设 2,我们知道 i10i2< sub>0 是最终迭代器产生的最小元素。下一个元素将是 i10 中较小的一个 ※ i21i1< sub>1 ※ i20

假设 i10i21 较小,您将产生该元素。接下来,您将产生 i11i20i1 中较小的一个1i20i11i21

作为 DAG 遍历的表达式

这里有一个图遍历问题。首先,将问题视为一棵树。树的根是i10i20。该节点及其下面的每个节点都有两个子节点。 i1xi2y 的两个子元素如下:其中一个子元素是 i1x+1i2y,另一个孩子是i1xi2y+1。根据您的第二个假设,我们知道 i1xi2y 小于其两个子级。

(事实上​​,正如 Ryan 在评论中提到的,这是一个有向无环图,或 DAG。一些“父节点”与其他“父节点”共享“子节点”。)

现在,我们需要保留一个边界 - 下一个可能返回的节点的集合。返回一个节点后,我们将其两个子节点添加到边界。为了选择要访问的下一个节点(并从新迭代器返回),我们比较边界中所有节点的值。我们取出具有最小值的节点并将其返回。然后,我们再次将其两个子节点添加到边界。如果孩子已经在边界中(添加为其他父母的孩子),则忽略它。

存储边界

因为您主要对节点的值感兴趣,所以存储按值索引的节点是有意义的。因此,使用 dict 可能符合您的兴趣。该字典中的键应该是节点的值。该字典中的值应该是包含各个节点的列表。由于节点中唯一的标识信息是操作数对,因此您可以将各个节点存储为操作数的二元组。

实际上,经过几次迭代后,您的边界可能如下所示:

>>> frontier
{1: [(2, 3), (2, 4)], 2: [(3, 5), (5, 4)], 3: [(1, 6)], 4: [(6, 3)]}

其他实现说明

由于迭代器不支持随机访问,因此您需要保留前两个迭代器生成的值,直到它们不再是需要。 如果边界中的任何值引用了某个值,您就会知道该值仍然是需要的。一旦边界参考值中的所有节点晚于/大于您存储的值,您就会知道不再需要某个值。例如,当边界中的节点仅引用 i121时,不再需要 i120 i125, i133, ...

正如 Ryan 所提到的,每个迭代器的每个值都将是使用了无数次。因此,所产生的每一个价值都需要保存。

不切实际

不幸的是,为了确保元素仅按递增顺序返回,边界将无限制地增长。您记忆的值可能还会占用大量空间并且还会无限制地增长。您可以通过减少问题的普遍性来解决这个问题,但这应该是一个很好的起点。

Let me start with an example of how I would solve this intuitively.

Because reading code inline is a little tedious, I'll introduce some notation:

Notation

  • i1 will represent iter1. i10 will represent the first element of iter1. Same for iter2.
  • ※ will represent the op operator

Intuitive solution

By using simplifying assumption 2, we know that i10i20 is the smallest element that will ever be yielded from your final iterator. The next element would the smaller of i10i21 and i11i20.

Assuming i10i21 is smaller, you would yield that element. Next, you would yield the smaller of i11i20, i11i20, and i11i21.

Expression as traversal of a DAG

What you have here is a graph traversal problem. First, think of the problem as a tree. The root of the tree is i10i20. This node, and each node below it, has two children. The two children of i1xi2y are the following: One child is i1x+1i2y, and the other child is i1xi2y+1. Based on your second assumption, we know that i1xi2y is less than both of its children.

(In fact, as Ryan mentions in a comment, this is a directed acyclic graph, or DAG. Some "parents" share "children" with other "parent" nodes.)

Now, we need to keep a frontier - a collection of nodes that could be next to be returned. After returning a node, we add both its children to the frontier. To select the next node to visit (and return from your new iterator), we compare the values of all the nodes in the frontier. We take the node with the smallest value and we return it. Then, we again add both of its child nodes to the frontier. If the child is already in the frontier (added as the child of some other parent), just ignore it.

Storing the frontier

Because you are primarily interested in the value of the nodes, it makes sense to store these nodes indexed by value. As such, it may be in your interest to use a dict. Keys in this dict should be the values of nodes. Values in this dict should be lists containing individual nodes. Because the only identifying information in a node is the pair of operands, you can store individual nodes as a two-tuple of operands.

In practice, after a few iterations, your frontier may look like the following:

>>> frontier
{1: [(2, 3), (2, 4)], 2: [(3, 5), (5, 4)], 3: [(1, 6)], 4: [(6, 3)]}

Other implementation notes

Because iterators don't support random access, you'll need to hang on to values that are produced by your first two iterators until they are no longer needed. You'll know that a value is still needed if it is referenced by any value in your frontier. You'll know that a value is no longer needed once all nodes in the frontier reference values later/greater than one you've stored. For example, i120 is no longer needed when nodes in your frontier reference only i121, i125, i133, ...

As mentioned by Ryan, each value from each iterator will be used an infinite number of times. Thus, every value produced will need to be saved.

Not practical

Unfortunately, in order to assure that elements are returned only in increasing order, the frontier will grow without bound. Your memoized values will probably also take a significant amount of space also grow without bound. This may be something you can address by making your problem less general, but this should be a good starting point.

墨洒年华 2024-11-12 23:14:27

因此,您基本上想要采用两个单调递增的序列,然后(惰性地)计算它们之间的乘法(或加法,或其他运算)表,这是一个二维数组。然后,您想要按排序顺序放置该二维数组的元素并迭代它们。

一般来说,这是不可能的。但是,如果您的顺序和操作可以对表的行和列做出一定的保证,那么您可以取得一些进展。例如,假设您的序列只是正整数的单调递增序列,并且操作是乘法(如您的示例中所示)。在这种情况下,我们知道数组的每一行和每一列都是一个单调递增的序列。在这种情况下,您不需要计算整个数组,而只需计算其中的一部分。具体来说,您必须跟踪以下内容:

  • 您曾经使用过多少行
  • 从您使用过的每一行中获取的元素数量
  • 您曾经使用过的任一输入序列中的每个元素,再加上每个元素

计算对于迭代器中的下一个元素,您必须执行以下操作:

  • 对于您曾经使用过的每一行,计算该行中的“下一个”值。例如,如果您使用了第 1 行中的 5 个值,则通过从第一个序列中获取第一个值和从第二个序列中获取第 6 个值(这两个值您都有)来计算第 6 个值 (i=1, j=6)缓存)并对它们应用运算(乘法)。另外,计算第一个未使用行中的第一个值。
  • 取您计算出的所有值中的最小值。生成此值作为迭代器中的下一个元素
  • 增加您在上一步中采样元素的行的计数器。如果您从新的、未使用的行中获取元素,则必须增加已使用的行数计数,并且必须为该行创建一个初始化为 1 的新计数器。如有必要,您还必须计算更多的值一个或两个输入序列。

这个过程有点复杂,特别要注意的是,要计算 N 值,在最坏的情况下,您必须保存与 N 的平方根成正比的状态量。(编辑:sqrt(N)实际上是最好情况。)这与典型的生成器形成鲜明对比,典型的生成器只需要恒定的空间来迭代其元素,而不管长度如何。

总之,您可以在某些假设下执行此操作,并且可以为其提供类似生成器的接口,但不能以“流”方式完成,因为您需要保存大量状态才能迭代元素按正确的顺序排列。

So you basically want to take two monotonically increasing sequences, and then (lazily) compute the multiplication (or addition, or another operation) table between them, which is a 2-D array. Then you want to put the elements of that 2-D array in sorted order and iterate through them.

In general, this is impossible. However, if your sequences and operation are such that you can make certain guarantees about the rows and columns of the table, then you can make some progress. For example, let's assume that your sequences are monitonically-increasing sequences of positive integers only, and that the operation is multiplication (as in your example). In this case, we know that every row and column of the array is a monotonically-increasing sequence. In this case, you do not need to compute the entire array, but rather only parts of it. Specifically, you must keep track of the following:

  • How many rows you have ever used
  • The number of elements you have taken from each row that you have used
  • Every element from either input sequence that you have ever used, plus one more from each

To compute the next element in your iterator, you must do the following:

  • For each row that you have ever used, compute the "next" value in that row. For example, if you have used 5 values from row 1, then compute the 6th value (i=1, j=6) by taking the 1st value from the first sequence and the 6th value from the second sequence (both of which you have cached) and applying the operation (multiplication) to them. Also, compute the first value in the first unused row.
  • Take the minimum of all the values you computed. Yield this value as the next element in your iterator
  • Increment the counter for the row from which you sampled the element in the previous step. If you took the element from a new, unused row, you must increment the count of the number of rows you have used, and you must create a new counter for that row initialized to 1. If necessary, you must also compute more values of one or both input sequences.

This process is kind of complex, and in particular notice that to compute N values, you must in the worst case save an amount of state proportional to the square root of N. (Edit: sqrt(N) is actually the best case.) This is in stark contrast to a typical generator, which only requires constant space to iterate through its elements regardless of length.

In summary, you can do this under certain assumptions, and you can provide a generator-like interface to it, but it cannot be done in a "streaming" fashion, because you need to save a lot of state in order to iterate through the elements in the correct order.

街角卖回忆 2024-11-12 23:14:27

使用 生成器,它们只是编写为产生结果的函数的迭代器。在这种情况下,您可以为 iter1iter2 编写生成器,并编写另一个生成器来包装它们并生成它们的结果(或者用它们进行计算,或者它们结果的历史记录),如下所示你去。

从我对问题的阅读来看,您想要这样的东西,它将使用所述操作计算第一个迭代器的每个元素与下一个迭代器的每个元素,您还声明您想要某种方式将“iter1”、“iter2”和“op”包装在一个迭代中,该迭代本身会产生单调递增输出的值。我建议生成器为此类问题提供一个简单的解决方案。

import itertools

def prime_gen():
    D, q = {}, 2
    while True:
        if q not in D:
            yield q
            D[q * q] = [q]
        else:
            for p in D[q]:
                D.setdefault(p + q, []).append(p)
            del D[q]
        q += 1

def infinite_gen(op, iter1, iter2):
    while True:
        yield op(iter1.next(), iter2.next())

>>> gen = infinite_gen(operator.mul, prime_gen(), itertools.count())

>>> gen.next()
<<< 0

>>> gen.next()
<<< 3

>>> gen.next()
<<< 10

生成器提供了很大的灵活性,因此将 iter1iter2 编写为生成器应该相当容易,它们可以按照您想要的顺序返回您想要的值。您还可以考虑使用 协程,它可以让您将值发送到 一个发电机。

Use generators, which are just iterators written as functions that yield results. In this case you can write generators for iter1 and iter2 and another generator to wrap them and yield their results (or do calculations with them, or the history of their results) as you go.

From my reading of the question you want something like this, which will calculate every element of the first iterator with every element of the next, using said operation, you also state you want some way to wrap-up "iter1", "iter2", and "op" in an iterable that itself yields the values in monotonically increasing output. I propose generators offer a simple solution to such problem.

import itertools

def prime_gen():
    D, q = {}, 2
    while True:
        if q not in D:
            yield q
            D[q * q] = [q]
        else:
            for p in D[q]:
                D.setdefault(p + q, []).append(p)
            del D[q]
        q += 1

def infinite_gen(op, iter1, iter2):
    while True:
        yield op(iter1.next(), iter2.next())

>>> gen = infinite_gen(operator.mul, prime_gen(), itertools.count())

>>> gen.next()
<<< 0

>>> gen.next()
<<< 3

>>> gen.next()
<<< 10

Generators offer a lot of flexibility, so it should be fairly easy to write iter1 and iter2 as generators that return values you want in the order you want. You could also consider using coroutines, which let you send values into a generator.

不顾 2024-11-12 23:14:27

其他答案中的讨论表明,无论采用什么算法,都可能需要无限的存储空间,因为每个 a[n] 必须对每个新的 b[n] 保持可用。如果您删除输入是两个迭代器的限制,而只要求它们是序列(可索引或仅仅是可以重复生成的东西),那么我相信您的所有状态都会突然崩溃为一个数字:您返回的最后一个值。知道最后一个结果值,您可以搜索输出空间以查找下一个结果值。 (如果您想正确地发出重复项,那么您可能还需要跟踪返回结果的次数)

对于一对序列,您有一个简单的递归关系:

result(n) = f(seq1, seq1, result(n-1))

其中 f(seq1, seq1, p)< /code> 在输出空间 q 中搜索最小值,使得 q > > p。实际上,您可能会将序列设为记忆函数并选择搜索算法以避免破坏记忆项目池。

Discussion in other answers observes that there is potentially infinite storage required no matter what the algorithm, since every a[n] must remain available for each new b[n]. If you remove the restriction that the input be two iterators and instead only require that they be sequences (indexable or merely something that can be regenerated repeatedly) then I believe all of your state suddenly collapses to one number: The last value you returned. Knowing the last result value you can search the output space looking for the next one. (If you want to emit duplicates properly then you may need to also track the number of times the result has been returned)

With a pair of sequences you have a simple recurrence relation:

result(n) = f(seq1, seq1, result(n-1))

where f(seq1, seq1, p) searches for the minimum value in the output space q such that q > p. In practical terms you'd probably make the sequences memoized functions and choose your search algorithm to avoid thrashing the pool of memoized items.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文