我有一些示例 Python 代码,需要用 C++ 来模拟。我不需要任何特定的解决方案(例如基于协同例程的产量解决方案,尽管它们也是可接受的答案),我只需要以某种方式重现语义。
Python
这是一个基本的序列生成器,显然太大而无法存储具体化版本。
def pair_sequence():
for i in range(2**32):
for j in range(2**32):
yield (i, j)
目标是维护上述序列的两个实例,并以半同步但分块的方式迭代它们。在下面的示例中,first_pass
使用对序列来初始化缓冲区,second_pass
重新生成相同的序列并再次处理缓冲区。
def run():
seq1 = pair_sequence()
seq2 = pair_sequence()
buffer = [0] * 1000
first_pass(seq1, buffer)
second_pass(seq2, buffer)
... repeat ...
C++
我能在 C++ 中找到的唯一解决方案是用 C++ 协程模拟 yield
,但我还没有找到任何关于如何做到这一点的好的参考。我也对这个问题的替代(非通用)解决方案感兴趣。我没有足够的内存预算来保存各遍之间的序列副本。
I've got some example Python code that I need to mimic in C++. I do not require any specific solution (such as co-routine based yield solutions, although they would be acceptable answers as well), I simply need to reproduce the semantics in some manner.
Python
This is a basic sequence generator, clearly too large to store a materialized version.
def pair_sequence():
for i in range(2**32):
for j in range(2**32):
yield (i, j)
The goal is to maintain two instances of the sequence above, and iterate over them in semi-lockstep, but in chunks. In the example below the first_pass
uses the sequence of pairs to initialize the buffer, and the second_pass
regenerates the same exact sequence and processes the buffer again.
def run():
seq1 = pair_sequence()
seq2 = pair_sequence()
buffer = [0] * 1000
first_pass(seq1, buffer)
second_pass(seq2, buffer)
... repeat ...
C++
The only thing I can find for a solution in C++ is to mimic yield
with C++ coroutines, but I haven't found any good reference on how to do this. I'm also interested in alternative (non general) solutions for this problem. I do not have enough memory budget to keep a copy of the sequence between passes.
发布评论
评论(14)
生成器存在于 C++ 中,只是有另一个名称:输入迭代器。例如,从
std::cin
读取类似于拥有char
生成器。您只需要了解生成器的作用:
有 你的例子很简单。从概念上讲:
当然,我们将其包装为一个适当的类:
所以嗯,是的...可能 C++ 有点冗长:)
Generators exist in C++, just under another name: Input Iterators. For example, reading from
std::cin
is similar to having a generator ofchar
.You simply need to understand what a generator does:
In your trivial example, it's easy enough. Conceptually:
Of course, we wrap this as a proper class:
So hum yeah... might be that C++ is a tad more verbose :)
在 C++ 中,有迭代器,但实现迭代器并不简单:必须查阅 迭代器概念并仔细设计新的迭代器类来实现它们。值得庆幸的是,Boost 有一个 iterator_facade 模板应该有助于实现迭代器和迭代器兼容的生成器。
有时 无堆栈协程可用于实现迭代器。
另请参阅这篇文章其中提到了 Christopher M. Kohlhoff 的
switch
黑客攻击和 Boost.Coroutine 作者:Oliver Kowalke。PS我认为你也可以编写一种生成器使用lambdas:
或者使用函子:
PS这是一个用以下实现的生成器Mordor 协程:
In C++ there are iterators, but implementing an iterator isn't straightforward: one has to consult the iterator concepts and carefully design the new iterator class to implement them. Thankfully, Boost has an iterator_facade template which should help implementing the iterators and iterator-compatible generators.
Sometimes a stackless coroutine can be used to implement an iterator.
P.S. See also this article which mentions both a
switch
hack by Christopher M. Kohlhoff and Boost.Coroutine by Oliver Kowalke.P.S. I think you can also write a kind of generator with lambdas:
Or with a functor:
P.S. Here's a generator implemented with the Mordor coroutines:
由于 Boost.Coroutine2 现在很好地支持它(我找到它是因为我想准确地解决同样的
yield
问题),我发布了符合您初衷的 C++ 代码:在此示例中,
pair_sequence
不接受额外的参数。如果需要,应使用 std::bind 或 lambda 来生成一个函数对象,该函数对象在传递给coro_t::pull_type
构造函数。Since Boost.Coroutine2 now supports it very well (I found it because I wanted to solve exactly the same
yield
problem), I am posting the C++ code that matches your original intention:In this example,
pair_sequence
does not take additional arguments. If it needs to,std::bind
or a lambda should be used to generate a function object that takes only one argument (ofpush_type
), when it is passed to thecoro_t::pull_type
constructor.所有涉及编写自己的迭代器的答案都是完全错误的。这样的答案完全忽略了 Python 生成器(该语言最伟大、最独特的功能之一)的意义。关于生成器最重要的一点是,执行会从中断处继续执行。这不会发生在迭代器上。相反,您必须手动存储状态信息,以便在重新调用operator++ 或operator* 时,正确的信息位于下一个函数调用的最开始处。这就是为什么编写自己的 C++ 迭代器是一件非常痛苦的事情;然而,生成器很优雅,并且易于读写。
我不认为原生 C++ 中的 Python 生成器有一个很好的模拟,至少目前还没有(有传言说 产量将登陆 C++17)。您可以通过求助于第三方(例如 Yongwei 的 Boost 建议)或自己推出类似的东西来获得类似的东西。
我想说原生 C++ 中最接近的东西是线程。线程可以维护一组挂起的局部变量,并且可以从中断处继续执行,非常类似于生成器,但是您需要滚动一些额外的基础结构来支持生成器对象与其调用者之间的通信。例如
,这个解决方案有几个缺点:
All answers that involve writing your own iterator are completely wrong. Such answers entirely miss the point of Python generators (one of the language's greatest and unique features). The most important thing about generators is that execution picks up where it left off. This does not happen to iterators. Instead, you must manually store state information such that when operator++ or operator* is called anew, the right information is in place at the very beginning of the next function call. This is why writing your own C++ iterator is a gigantic pain; whereas, generators are elegant, and easy to read+write.
I don't think there is a good analog for Python generators in native C++, at least not yet (there is a rummor that yield will land in C++17). You can get something similarish by resorting to third-party (e.g. Yongwei's Boost suggestion), or rolling your own.
I would say the closest thing in native C++ is threads. A thread can maintain a suspended set of local variables, and can continue execution where it left off, very much like generators, but you need to roll a little bit of additional infrastructure to support communication between the generator object and its caller. E.g.
This solution has several downsides though:
使用 range-v3:
Using range-v3:
您可能应该在 Visual Studio 2015 中检查 std::experimental 中的生成器,例如: https://blogs.msdn.microsoft.com/vcblog/2014/11/12/resumable-functions-in-c/
我认为这正是您正在寻找的。总体生成器应该在 C++17 中可用,因为这只是实验性的 Microsoft VC 功能。
You should probably check generators in std::experimental in Visual Studio 2015 e.g: https://blogs.msdn.microsoft.com/vcblog/2014/11/12/resumable-functions-in-c/
I think it's exactly what you are looking for. Overall generators should be available in C++17 as this is only experimental Microsoft VC feature.
如果您只需要对相对少量的特定生成器执行此操作,则可以将每个生成器实现为一个类,其中成员数据相当于Python生成器函数的局部变量。然后你有一个 next 函数,它返回生成器将产生的下一个结果,并在执行过程中更新内部状态。
我相信这基本上与 Python 生成器的实现方式类似。主要区别在于它们可以记住生成器函数的字节码偏移量作为“内部状态”的一部分,这意味着生成器可以编写为包含收益的循环。您必须根据前一个值计算下一个值。对于您的
pair_sequence
来说,这是非常微不足道的。它可能不适用于复杂的发电机。您还需要某种方式来指示终止。如果您返回的是“类似指针”,并且 NULL 不应该是有效的可生成值,您可以使用 NULL 指针作为终止指示符。否则您需要带外信号。
If you only need to do this for a relatively small number of specific generators, you can implement each as a class, where the member data is equivalent to the local variables of the Python generator function. Then you have a next function that returns the next thing the generator would yield, updating the internal state as it does so.
This is basically similar to how Python generators are implemented, I believe. The major difference being they can remember an offset into the bytecode for the generator function as part of the "internal state", which means the generators can be written as loops containing yields. You would have to instead calculate the next value from the previous. In the case of your
pair_sequence
, that's pretty trivial. It may not be for complex generators.You also need some way of indicating termination. If what you're returning is "pointer-like", and NULL should not be a valid yieldable value you could use a NULL pointer as a termination indicator. Otherwise you need an out-of-band signal.
这样的事情非常相似:
使用operator()只是一个你想用这个生成器做什么的问题,你也可以将它构建为一个流并确保它适应istream_iterator,例如。
Something like this is very similar:
Using the operator() is only a question of what you want to do with this generator, you could also build it as a stream and make sure it adapts to an istream_iterator, for example.
嗯,今天我也在寻找 C++11 下的简单集合实现。事实上我很失望,因为我发现的一切都与 python 生成器或 C# 生成运算符之类的东西相去甚远……或者太复杂了。
目的是创建仅在需要时才会发出其项目的集合。
我希望它是这样的:
我发现这篇文章,恕我直言,最佳答案是关于 boost.coroutine2,作者 Yongwei Wu。因为它最接近作者想要的。
值得学习 boost couroutines.. 我也许会在周末做。但到目前为止我正在使用我的非常小的实现。希望它对其他人有帮助。
下面是使用示例,然后是实现。
Example.cpp
Generator.h
Well, today I also was looking for easy collection implementation under C++11. Actually I was disappointed, because everything I found is too far from things like python generators, or C# yield operator... or too complicated.
The purpose is to make collection which will emit its items only when it is required.
I wanted it to be like this:
I found this post, IMHO best answer was about boost.coroutine2, by Yongwei Wu. Since it is the nearest to what author wanted.
It is worth learning boost couroutines.. And I'll perhaps do on weekends. But so far I'm using my very small implementation. Hope it helps to someone else.
Below is example of use, and then implementation.
Example.cpp
Generator.h
可以通过简单的 goto 语句来实现yield 行为。因为很简单,我用 C 语言编写了它。
您在生成器函数中所需要做的就是:
:
It is possible to have yield comportment with simple goto statement. As it is simple, I wrote it in C.
All you have to do in your generator function is :
example :
类似 this 的内容:
示例使用:
将打印从 0 到 99 的数字
Something like this:
Example use:
Will print the numbers from 0 to 99
这个答案在 C 中有效(因此我认为在 C++ 中也有效)
这是模仿生成器的简单、非面向对象的方法。这对我来说正如预期的那样。
编辑:以前的代码是错误的,我已经更新了它。
注意:对于给定的问题,可以改进此代码,仅使用 uint32_t 而不是 uint64_t。
This answer works in C (and hence I think works in C++ too)
This is simple, non object-oriented way to mimic a generator. This worked as expected for me.
Edit: Previous code was erroneous and I have updated it.
Note: This code can be improved to use just uint32_t instead of uint64_t for the given question.
我偶然发现了这篇文章,很旧的帖子,但它可能会对将来阅读它的其他人有所帮助。
我似乎不太明白这个问题,因为我没有看到这个问题。
基本上,我们需要保存 i & 的状态。 j。为什么不只使用一个简单的 C++ 类呢?
唯一的一点是知道在哪里停止,在我的示例中,我实际上返回 std::pair,第一个元素带有 bool 指示是否有更多元素,第二个元素带有实际值。
所以我们可以像这样使用它:
输出:
这只是普通的 C++。现在,它必须是一个函数而不是一个 C++ 对象吗?
很简单,我们将对象封装在 lambda 中:
有时我需要 C++ 中的生成器,如果代码不长,我只需将它们封装在 lambda 中。内部声明的静态成员仅属于该 lambda 实例,因此我可以拥有多个实例。这不适用于仅一个函数,其中静态存储只是一个函数。
I came about this by chance, very old post, but it may help some others who read it in the future.
I don't quite understand the problem, it seems, because I don't see the problem.
Basically, we need to save the states of i & j. Why not use just a simple C++ class?
The only point is to know where to stop, and in my example I am actually returning std::pair, first element with a bool indicating if there are more elements, and the second with the real value.
And so we can use it like this :
Output :
This is just normal C++. Now, does it have to be a function and not a C++ object?
Simple, we encapsulate the objects in a lambda:
Sometimes I need generators in C++, and if the code is not long I just encapsulate them in a lambda. The static members declared inside belong only to that instance of the lambda, so I can have several instances. This would not work with just a function, where static storage is only one.
正如函数模拟堆栈的概念一样,生成器模拟队列的概念。剩下的就是语义了。
附带说明一下,您始终可以通过使用操作堆栈而不是数据来模拟带有堆栈的队列。这实际上意味着您可以通过返回一对来实现类似队列的行为,其中第二个值要么具有要调用的下一个函数,要么表明我们没有值。但这比收益率与回报的关系更为普遍。它允许模拟任何值的队列,而不是您期望从生成器获得的同类值,但不保留完整的内部队列。
更具体地说,由于 C++ 没有对队列的自然抽象,因此您需要使用在内部实现队列的构造。因此,给出迭代器示例的答案是该概念的一个不错的实现。
这实际上意味着,如果您只想快速执行某些操作,然后像使用生成器生成的值一样使用队列的值,则可以使用基本的队列功能来实现某些功能。
Just as a function simulates the concept of a stack, generators simulate the concept of a queue. The rest is semantics.
As a side note, you can always simulate a queue with a stack by using a stack of operations instead of data. What that practically means is that you can implement a queue-like behavior by returning a pair, the second value of which either has the next function to be called or indicates that we are out of values. But this is more general than what yield vs return does. It allows to simulate a queue of any values rather than homogeneous values that you expect from a generator, but without keeping a full internal queue.
More specifically, since C++ does not have a natural abstraction for a queue, you need to use constructs which implement a queue internally. So the answer which gave the example with iterators is a decent implementation of the concept.
What this practically means is that you can implement something with bare-bones queue functionality if you just want something quick and then consume queue's values just as you would consume values yielded from a generator.