Haskell 中的所有内容都存储在 thunk 中吗,甚至是简单的值?
以下值/表达式/函数的 thunk 在 Haskell 堆中是什么样子的?
val = 5 -- is `val` a pointer to a box containing 5?
add x y = x + y
result = add 2 val
main = print $ result
考虑到 Haskell 的惰性求值模式,如果能够了解这些在 Haskell 中的表示方式,那就太好了。
What do the thunks for the following value/expression/function look like in the Haskell heap?
val = 5 -- is `val` a pointer to a box containing 5?
add x y = x + y
result = add 2 val
main = print $ result
Would be nice to have a picture of how these are represented in Haskell, given its lazy evaluation mode.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
官方答复
不关你的事。编译器的严格实现细节。
简短回答
是的。
更长的答案
对于 Haskell 程序本身来说,答案总是肯定的,但是如果编译器发现它可以摆脱它,出于性能原因,它可以并且将会做不同的事情。
例如,对于“add xy = x + y”,编译器可能会生成与 x 和 y 的 thunk 一起使用的代码,并最终构造一个 thunk。
但请考虑以下情况:
在这里,优化编译器将生成代码,首先将 x 和 y 从其盒子中取出,然后执行所有算术,然后将结果存储在盒子中。
高级答案
本文描述了 GHC 如何从一种实现 thunk 的方式切换到另一种实际上更简单、更快的方式:
http://research.microsoft.com/en-us/um/people /simonpj/papers/eval-apply/
Official answer
It's none of your business. Strictly implementation detail of your compiler.
Short answer
Yes.
Longer answer
To the Haskell program itself, the answer is always yes, but the compiler can and will do things differently if it finds out that it can get away with it, for performance reasons.
For example, for '''add x y = x + y''', a compiler might generate code that works with thunks for x and y and constructs a thunk as a result.
But consider the following:
Here, an optimizing compiler will generate code that first takes x and y out of their boxes, then does all the arithmetic, and then stores the result in a box.
Advanced answer
This paper describes how GHC switched from one way of implementing thunks to another that was actually both simpler and faster:
http://research.microsoft.com/en-us/um/people/simonpj/papers/eval-apply/
一般来说,即使 Haskell 中的原始值(例如 Int 和 Float 类型)也由 thunk 表示。这确实是非严格语义所要求的;考虑以下片段:
仅当检查了 Bottom 的值时,此定义才会生成除零的异常,但如果从未使用过该值则不会。
现在考虑 add 函数:
add 的简单实现必须强制 x 的 thunk,强制 y 的 thunk,添加值并为结果创建一个(评估的)thunk。与严格的函数式语言(更不用说命令式语言)相比,这对于算术来说是巨大的开销。
然而,像 GHC 这样的优化编译器基本上可以避免这种开销;这是 GHC 如何转换 add 函数的简化视图:
在内部,像 Int 这样的基本类型被视为具有单个构造函数的数据类型。 Int# 类型是整数的“原始”机器类型,+# 是原始类型的原始加法。
对原始类型的操作直接在位模式(例如寄存器)上实现——而不是重击。然后装箱和拆箱被翻译为构造函数应用和模式匹配。
这种方法的优点(在这个简单的示例中不可见)是编译器通常能够内联此类定义并删除中间装箱/拆箱操作,只留下最外面的操作。
In general, even primitive values in Haskell (e.g. of type Int and Float) are represented by thunks. This is indeed required by the non-strict semantics; consider the following fragment:
This definition will generate a div-by-zero exception only if the value of bottom is inspected, but not if the value is never used.
Consider now the add function:
A naive implementation of add must force the thunk for x, force the thunk for y, add the values and create an (evaluated) thunk for the result. This is a huge overhead for arithmetic compared to strict functional languages (not to mention imperative ones).
However, an optimizing compiler such as GHC can mostly avoid this overhead; this is a simplified view of how GHC translates the add function:
Internally, basic types like Int is seen as datatype with a single constructor. The type Int# is the "raw" machine type for integers and +# is the primitive addition on raw types.
Operations on raw types are implemented directly on bit-patterns (e.g. registers) --- not thunks. Boxing and unboxing are then translated as constructor application and pattern matching.
The advantage of this approach (not visible in this simple example) is that the compiler is often capable of inlining such definitions and removing intermediate boxing/unboxing operations, leaving only the outermost ones.
将每个值包装在 thunk 中是绝对正确的。但由于 Haskell 是非严格的,编译器可以选择何时评估 thunk/表达式。特别是,如果可以生成更好的代码,编译器可以选择比严格必要的时间更早地计算表达式。
优化 Haskell 编译器 (GHC) 执行严格性分析来找出,值将始终被计算。
一开始,编译器必须假设函数的任何参数都没有被使用过。然后它遍历函数体并尝试找到以下函数应用程序:1)已知其参数(至少某些)是严格的,2)始终必须进行评估以计算函数的结果。
在您的示例中,我们的函数
(+)
的参数都很严格。因此,编译器知道此时始终需要评估x
和y
。现在碰巧的是,表达式
x+y
始终是计算函数结果所必需的,因此编译器可以存储函数add
在两者中都是严格的信息x
和y
。因此,为
add
* 生成的代码将期望整数值作为参数,而不是 thunk。当涉及递归(不动点问题)时,算法会变得更加复杂,但基本思想保持不变。另一个例子:
此函数将采用求值形式的
x
(作为布尔值)和y
作为 thunk。表达式x
需要通过mkList
在每个可能的执行路径中进行计算,这样我们就可以让调用者对其进行计算。另一方面,表达式 y 永远不会在任何参数严格的函数应用程序中使用。 cons 函数:
从不查看y
它只是将其存储在列表中。因此 y 需要作为 thunk 传递以满足惰性 Haskell 语义。*:
add
当然是多态的,x
和y
的确切类型取决于实例化。It would be absolutely correct to wrap every value in a thunk. But since Haskell is non-strict, compilers can choose when to evaluate thunks/expressions. In particular, compilers can choose to evaluate an expression earlier than strictly necessary, if it results in better code.
Optimizing Haskell compilers (GHC) perform Strictness analysis to figure out, which values will always be computed.
In the beginning, the compiler has to assume, that none of a function's arguments are ever used. Then it goes over the body of the function and tries to find functions applications that 1) are known to be strict in (at least some of) their arguments and 2) always have to be evaluated to compute the function's result.
In your example, we have the function
(+)
that is strict in both it's arguments. Thus the compiler knows that bothx
andy
are always required to be evaluated at this point.Now it just so happens, that the expression
x+y
is always necessary to compute the function's result, therefore the compiler can store the information that the functionadd
is strict in bothx
andy
.The generated code for
add
* will thus expect integer values as parameters and not thunks. The algorithm becomes much more complicated when recursion is involved (a fixed point problem), but the basic idea remains the same.Another example:
This function will take
x
in evaluated form (as a boolean) andy
as a thunk. The expressionx
needs to be evaluated in every possible execution path throughmkList
, thus we can have the caller evaluate it. The expressiony
, on the other hand, is never used in any function application that is strict in it's arguments. The cons-function:
never looks aty
it just stores it in a list. Thusy
needs to be passed as a thunk in order to satisfy the lazy Haskell semantics.*:
add
is of course polymorphic and the exact type ofx
andy
depends on the instantiation.简短回答:是的。
长答案:
这必须存储在一个 thunk 中,因为想象一下,如果我们在代码中任何地方编写这个代码(例如,在我们导入的库或其他东西中):
如果在我们的程序中必须对此进行评估启动,就会崩溃,对吗?如果我们实际上将该值用于某些东西,那将是我们想要的,但如果我们不使用它,它不应该对我们的程序产生如此灾难性的影响。
对于第二个例子,让我稍微改变一下:
这个值也必须存储在一个 thunk 中,因为想象一下这样的代码:
如果
div
这里是严格的,即使当该列表为null
(又名空),这意味着像这样编写函数是行不通的,因为当给定以下值时,它没有机会返回0
空列表,即使这是我们在这种情况下想要的。您的最后一个示例只是示例 1 的变体,并且出于相同的原因它必须是惰性的。
话虽如此,可以强制编译器使某些值严格,但这超出了这个问题的范围。
Short answer: Yes.
Long answer:
This has to be stored in a thunk, because imagine if we wrote this anywhere in our code (like, in a library we imported or something):
If this has to be evaluated when our program starts, it would crash, right? If we actually use that value for something, that would be what we want, but if we don't use it, it shouldn't be able to influence our program so catastrophically.
For your second example, let me change it a little:
This value has to be stored in a thunk as well, because imagine some code like this:
If
div
was strict here, it would be evaluated even when the list isnull
(aka. empty), meaning that writing the function like this wouldn't work, because it wouldn't have a chance to return0
when given the empty list, even though this is what we would want in this case.Your final example is just a variation of example 1, and it has to be lazy for the same reasons.
All this being said, it is possible to force the compiler to make some values strict, but that goes beyond the scope of this question.
我认为其他人很好地回答了你的问题,但为了完整起见,让我补充一下,GHC 也为你提供了直接使用未装箱值的可能性。这是 Haskell Wiki 对此的说法:
如前所述,它是不可移植的,因此您需要 GHC 语言扩展。请参阅此处获取他们的文档。
I think the others answered your question nicely, but just for completeness's sake let me add that GHC offers you the possibility of using unboxed values directly as well. This is what Haskell Wiki says about it:
As mentioned, it's non-portable, so you need a GHC language extension. See here for their docs.