好奇“loop=loop”是如何产生的在 Haskell 中进行评估
我认为这样的表达式会导致 Haskell 永远求值。但 GHCi 和编译程序的行为都让我感到惊讶。
例如,在 GHCi 中,这些表达式会一直阻塞,直到我 Control+C
,但不会消耗 CPU。看上去就像是在睡觉。
let loop = loop
let loop = 1 + loop
我尝试用 GHC 编译这些程序:
main = print loop
where loop = 1 + loop
main = print loop
where loop = if True then loop else 1
打印的内容是:
Main: <<loop>>
所以我的问题是:显然这些表达式被编译为与命令式语言中的循环或递归调用不同的东西。它们被编译成什么?这是处理位于右侧的 0-arg 函数的特殊规则,还是我不知道的更一般的情况的特殊情况?
[编辑]:
还有一个问题:如果这恰好是编译器的特殊处理,那么当不可能检查所有无限循环时,这样做的原因是什么? “熟悉的”语言不关心像 while (true);
或 int f() { return f();}
这样的情况,对吗?
非常感谢。
I thought expressions like this would cause Haskell to evaluate forever. But the behaviors in both GHCi and the compiled program surprised me.
For example, in GHCi, these expressions blocked until I Control+C
, but consumed no CPU. Looked like it was sleeping.
let loop = loop
let loop = 1 + loop
I tried compiling these programs with GHC:
main = print loop
where loop = 1 + loop
main = print loop
where loop = if True then loop else 1
What was printed was:
Main: <<loop>>
So my question is: Obviously these expressions are compiled to something different than loops or recursive calls in imperative languages. What are they compiled to? Is this a special rule to handle 0-arg functions that have themselves in the right hand side, or it's a special case of something more general that I don't know?
[EDIT]:
One more question: If this happens to be a special handling from the compiler, what is the reason behind doing this when it's impossible to check for all infinite loops? 'Familiar' languages don't care about cases like while (true);
or int f() { return f();}
, right?
Many thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
GHC 将 Haskell 实现为图形缩减机。将您的程序想象为一个图表,其中每个值作为一个节点,以及从该值到该值所依赖的每个值的线。不过,我们很懒,所以你实际上只从一个节点开始——为了评估该节点,GHC 必须“输入”它并将其打开到一个带有参数的函数。然后,它用函数体替换函数调用,并尝试将其减少到足以使其成为头部正常形式等。
上面的内容非常简单,我确信为了简洁起见,省略了一些必要的细节。
无论如何,当 GHC 输入一个值时,它通常会在评估节点时(或者,根据您的术语,在减少闭包时)将其替换为黑洞。这有很多目的。首先,它堵塞了潜在的空间泄漏。如果节点引用了一个在其他地方没有使用的值,则即使在评估节点时,黑洞也允许对该值进行垃圾收集。其次,这可以防止某些类型的重复工作,因为在多线程环境中,两个线程可能会尝试输入相同的值。黑洞将导致第二个线程阻塞,而不是评估已经评估的值。最后,这恰好允许有限形式的循环检测,因为如果线程尝试重新进入其自己的黑洞,我们可以抛出异常。
这是一个更隐喻的解释。如果我有一系列在屏幕上移动乌龟(在徽标中)的指令,则没有一种方法可以判断它们将产生什么形状,或者该形状是否在不运行它们的情况下终止。但是,如果在运行它们时,我注意到乌龟的路径已经穿过了自己,我可以向用户指示“啊哈!乌龟已经穿过了它的路径!”所以我知道乌龟已经到达了它之前到达过的地方 - 如果路径是通过评估图的节点的电路,那么这告诉我们我们处于循环中。然而,海龟也可以进入,例如,扩张的螺旋。它永远不会终止,但也永远不会穿越之前的路径。
因此,由于黑洞的使用,出于多种原因,我们有一些评估所遵循的标记“路径”的概念。如果路径与自身交叉,我们可以判断并抛出异常。然而,事情有一百万种不涉及路径交叉的分歧方式。在这些情况下,我们无法判断,也不会抛出异常。
有关当前黑洞实现的超级极客技术细节,请参阅 Simon Marlow 在最近的 Haskell Implementors Workshop 上的演讲,“在多核上安排惰性评估”,位于 http://haskell.org/haskellwiki/HaskellImplementorsWorkshop/2010。
GHC implements Haskell as a graph reduction machine. Imagine your program as a graph with each value as a node, and lines from it to each value that value depends on. Except, we're lazy, so you really start with just one node -- and to evaluate that node, GHC has to "enter" it and open it up to a function with arguments. It then replaces the function call with the body of the function, and attempts to reduce it enough to get it into head normal form, etc.
The above being very handwavy and I'm sure eliding some necessary detail in the interest of brevity.
In any case, when GHC enters a value, it generally replaces it with a black hole while the node is being evaluated (or, depending on your terminology, while the closure is being reduced) This has a number of purposes. First, it plugs a potential space leak. If the node references a value which is used nowhere else, the black hole allows that value to be garbage-collected even while the node is being evaluated. Second, this prevents certain types of duplicate work, since in a multi-threaded environment, two threads may attempt to enter the same value. The black-hole will cause the second thread to block rather than evaluate the value already being evaluated. Finally, this happens to allow for a limited form of loop detection, since if a thread attempts to re-enter its own black hole, we can throw an exception.
Here's a bit of a more metaphorical explanation. If I have a series of instructions that moves a turtle (in logo) around the screen, there's no one way to tell what shape they will produce, or whether that shape terminates without running them. But if, while running them, I notice that the path of the turtle has crossed itself, I can indicate to the user "aha! the turtle has crossed its path!" So I know that the turtle has reached a spot it has been before -- if the path is a circuit through evaluating the nodes of a graph, then that tells us we're in a loop. However, the turtle can also go in, for example, an expanding spiral. And it will never terminate, but it will also never cross its prior path.
So, because of the use of black holes, for multiple reasons, we have some notion of a marked "path" that evaluation has followed. And if the path crosses itself, we can tell and throw an exception. However, there are a million ways for things to diverge that don't involve the path crossing itself. And in those cases, we can't tell, and don't throw an exception.
For super-geeky technical detail about the current implementation of black holes, see Simon Marlow's talk from the recent Haskell Implementors Workshop, "Scheduling Lazy Evaluation on Multicore" at the bottom of http://haskell.org/haskellwiki/HaskellImplementorsWorkshop/2010.
在某些有限的情况下,编译器可以确定这样的循环存在作为其其他控制流分析的一部分,然后用引发适当异常的代码替换循环项。当然,这不可能在所有情况下都完成,而只能在一些更明显的情况下完成,在这些情况下,它会自然地从编译器正在执行的其他工作中脱颖而出。
至于为什么 Haskell 比其他语言更频繁地发现这种情况:
_|_
(“底部”),用于表示错误值。对自身严格的值 - 即它们依赖于自己的值来计算 - 是_|_
。_|_
的模式匹配结果可以是无限循环,也可以是异常;你的编译器在这里选择后者。In some, limited cases, the compiler can determine such a loop exists as part of its other control flow analyses, and at that point replaces the looping term with code that throws an appropriate exception. This cannot be done in all cases, of course, but only in some of the more obvious cases, where it falls out naturally from other work the compiler is doing.
As for why Haskell finds this more often than other languages:
_|_
("the bottom"), which is used to represent erroneous values. Values which are strict on themselves - ie, they depend on their own value to compute - are_|_
. The result of pattern-matching on_|_
can either be an infinite loop or an exception; your compiler is choosing the latter here.