实施“生成器”支持自定义语言
我对语言设计有一点迷恋,目前我正在尝试自己喜欢的语言。 (http://rogeralsing.com/2010/04/14/playing-with - Plastic/)
真正让我心碎的一件事是“生成器”和“yield”关键字。 我知道 C# 使用 AST 转换将枚举器方法转换为状态机。
但它在其他语言中是如何工作的呢? 有什么方法可以在不进行 AST 转换的语言中获得生成器支持吗? 例如,像 Python 或 Ruby 这样的语言是否会通过 AST 转换来解决这个问题?
(问题是生成器如何在不同语言的底层实现,而不是如何用其中一种语言编写生成器)
I've got a bit of fettish for language design and I'm currently playing around with my own hobby language. (http://rogeralsing.com/2010/04/14/playing-with-plastic/)
One thing that really makes my mind bleed is "generators" and the "yield" keyword.
I know C# uses AST transformation to transform enumerator methods into statemachines.
But how does it work in other languages?
Is there any way to get generator support in a language w/o AST transformation?
e.g. Does languages like Python or Ruby resort to AST transformations to solve this to?
(The question is how generators are implemented under the hood in different languages, not how to write a generator in one of them)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
生成器基本上是半协程,有一些恼人的限制。因此,显然,您可以使用半协程(当然还有完整协程)来实现它们。
如果您没有协程,则可以使用任何其他通用控制流结构。有很多控制流结构是“通用的”,每个控制流结构(包括所有其他通用控制流结构),包括协程,因此生成器可以(或多或少) )只是简单地转化为那个通用结构。
其中最著名的可能是
GOTO
。只需使用GOTO
,您就可以构建任何其他控制流结构:IF-THEN-ELSE
、WHILE
、< code>FOR、REPEAT-UNTIL
、FOREACH
、异常、线程、子程序调用、方法调用、函数调用等等,当然还有协程和发电机。几乎所有的CPU都支持
GOTO
(尽管在CPU中,他们通常称之为jmp
)。事实上,在许多 CPU 中,GOTO
是唯一的控制流结构,尽管现在至少支持子例程调用 (call
),也许通常还内置了一些原始形式的异常处理和/或并发原语(比较和交换)。另一个众所周知的控制流原语是延续。延续基本上是 GOTO 的一种更结构化、更易于管理且不那么邪恶的变体,在函数式语言中尤其流行。但也有一些低级语言将其控制流基于延续,例如 Parrot 虚拟机使用延续来控制流,我相信在某些研究实验室中甚至有一些基于延续的 CPU。
C 有一种“蹩脚”形式的延续(
setjmp
和longjmp
),它们比“真正的”延续功能更弱,也更不容易使用,但它们足够强大来实现生成器(事实上,可以用来实现完整的延续)。在 Unix 平台上,
setcontext
可以用作setjmp
/longjmp
的更强大、更高级别的替代方案。另一个众所周知的控制流构造是例外,但可能不会因为低级底层构建而想到,其他控制流构造。有一篇论文表明异常可以比延续更强大,因此使得异常本质上等同于 GOTO ,因此具有普遍的强大功能。事实上,异常有时被用作通用控制流结构:Microsoft Volta 项目将 .NET 字节码编译为 JavaScript,使用 JavaScript 异常来实现 .NET 线程和生成器。
不通用,但可能强大到足以实现生成器,这只是简单的尾调用优化。 (不过,我可能是错的。不幸的是,我没有证据。)我认为您可以将生成器转换为一组相互尾递归函数。我知道状态机可以使用尾部调用来实现,所以我很确定生成器也可以,因为毕竟 C# 将生成器实现为状态机。 (我认为这与惰性求值结合起来效果特别好。)
最后但并非最不重要的一点是,在具有具体化调用堆栈的语言中(例如大多数 Smalltalks),您可以构建几乎任何类型的您想要的控制流结构。 (事实上,具体化的调用堆栈基本上是相当于功能高级延续的过程低级。)
那么,生成器的其他实现是什么样的?
Lua 本身没有生成器,但它有完整的非对称协程。主要的 C 实现使用
setjmp
/longjmp
来实现它们。Ruby 本身也没有生成器,但它有
Enumerator
,可以用作生成器。枚举器
不是语言的一部分,它们是一个库功能。 MRI 使用延续实现Enumerator
,而延续又使用setjmp
/longjmp
实现。 YARV 使用Fiber
(Ruby 拼写“协程”的方式)实现Enumerator
,并且那些使用setjmp实现代码>/<代码>longjmp。我相信 JRuby 目前使用线程实现枚举器,但一旦 JVM 获得了一些更好的控制流结构,他们就希望切换到更好的东西。
Python 的生成器实际上或多或少是成熟的协程。 CPython 使用
setjmp
/longjmp
实现它们。Generators are basically semi-coroutines with some annoying limitations. So, obviously, you can implement them using semi-coroutines (and full coroutines, of course).
If you don't have coroutines, you can use any of the other universal control flow constructs. There are a lot of control flow constructs that are "universal" in the sense that every control flow construct (including all the other universal control flow constructs), including coroutines and thus generators can be (more or less) trivially transformed into only that universal construct.
The most well-known of those is probably
GOTO
. With justGOTO
, you can build any other control flow construct:IF-THEN-ELSE
,WHILE
,FOR
,REPEAT-UNTIL
,FOREACH
, exceptions, threads, subroutine calls, method calls, function calls and so on, and of course also coroutines and generators.Almost all CPUs support
GOTO
(although in a CPU, they usually call itjmp
). In fact, in many CPUs,GOTO
is the only control flow construct, although today native support for at least subroutine calls (call
) and maybe some primitive form of exception handling and/or concurrency primitive (compare-and-swap) are usually also built in.Another well-known control flow primitive are continuations. Continuations are basically a more structured, better manageable and less evil variant of
GOTO
, especially popular in functional languages. But there also some low-level languages that base their control flow on continuations, for example the Parrot Virtual Machine uses continuations for control flow and I believe there are even some continuation-based CPUs in some research lab somewhere.C has a sort-of "crappy" form of continuations (
setjmp
andlongjmp
), that are much less powerful and less easy to use than "real" continuations, but they are plenty powerful enough to implement generators (and in fact, can be used to implement full continuations).On a Unix platform,
setcontext
can be used as a more powerful and higher level alternative tosetjmp
/longjmp
.Another control flow construct that is well-known, but doesn't probably spring to mind as a low-level substrate build other control flow constructs on top of, are exceptions. There is a paper that shows that exceptions can be more powerful than continuations, thus making exceptions essentially equivalent to
GOTO
and thus universally powerful. And, in fact, exceptions are sometimes used as universal control flow constructs: the Microsoft Volta project, which compiled .NET bytecode to JavaScript, used JavaScript exceptions to implement .NET threads and generators.Not universal, but probably powerful enough to implement generators is just plain tail call optimization. (I might be wrong, though. I unfortunately don't have a proof.) I think you can transform a generator into a set of mutually tail-recursive functions. I know that state machines can be implemented using tail calls, so I'm pretty sure generators can, too, since, after all, C# implements generators as state machines. (I think this works especially well together with lazy evaluation.)
Last but not least, in a language with a reified call stack (like most Smalltalks for example), you can build pretty much any kind of control flow constructs you want. (In fact, a reified call stack is basically the procedural low-level equivalent to the functional high-level continuation.)
So, what do other implementations of generators look like?
Lua doesn't have generators per se, but it has full asymmetric coroutines. The main C implementation uses
setjmp
/longjmp
to implement them.Ruby also doesn't have generators per se, but it has
Enumerator
s, that can be used as generators.Enumerator
s are not part of the language, they are a library feature. MRI implementsEnumerator
s using continuations, which in turn are implemented usingsetjmp
/longjmp
. YARV implementsEnumerator
s usingFiber
s (which is how Ruby spells "coroutines"), and those are implemented usingsetjmp
/longjmp
. I believe JRuby currently implementsEnumerator
s using threads, but they want to switch to something better as soon as the JVM gains some better control flow constructs.Python has generators that are actually more or less full-blown coroutines. CPython implements them using
setjmp
/longjmp
.