Haskell 中的所有内容都存储在 thunk 中吗,甚至是简单的值?

发布于 2024-12-20 23:27:44 字数 268 浏览 3 评论 0原文

以下值/表达式/函数的 thunk 在 Haskell 堆中是什么样子的?

val = 5                -- is `val` a pointer to a box containing 5?
add x y = x + y        
result = add 2 val     
main = print $ result

考虑到 Haskell 的惰性求值模式,如果能够了解这些在 Haskell 中的表示方式,那就太好了。

What do the thunks for the following value/expression/function look like in the Haskell heap?

val = 5                -- is `val` a pointer to a box containing 5?
add x y = x + y        
result = add 2 val     
main = print $ result

Would be nice to have a picture of how these are represented in Haskell, given its lazy evaluation mode.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

波浪屿的海角声 2024-12-27 23:27:44

官方答复

不关你的事。编译器的严格实现细节。

简短回答

是的。

更长的答案

对于 Haskell 程序本身来说,答案总是肯定的,但是如果编译器发现它可以摆脱它,出于性能原因,它可以并且将会做不同的事情。

例如,对于“add xy = x + y”,编译器可能会生成与 x 和 y 的 thunk 一起使用的代码,并最终构造一个 thunk。
但请考虑以下情况:

foo :: Int -> Int -> Int
foo x y = x * x + y * y

在这里,优化编译器将生成代码,首先将 x 和 y 从其盒子中取出,然后执行所有算术,然后将结果存储在盒子中。

高级答案

本文描述了 GHC 如何从一种实现 thunk 的方式切换到另一种实际上更简单、更快的方式:
http://research.microsoft.com/en-us/um/people /simonpj/papers/eval-apply/

Official answer

It's none of your business. Strictly implementation detail of your compiler.

Short answer

Yes.

Longer answer

To the Haskell program itself, the answer is always yes, but the compiler can and will do things differently if it finds out that it can get away with it, for performance reasons.

For example, for '''add x y = x + y''', a compiler might generate code that works with thunks for x and y and constructs a thunk as a result.
But consider the following:

foo :: Int -> Int -> Int
foo x y = x * x + y * y

Here, an optimizing compiler will generate code that first takes x and y out of their boxes, then does all the arithmetic, and then stores the result in a box.

Advanced answer

This paper describes how GHC switched from one way of implementing thunks to another that was actually both simpler and faster:
http://research.microsoft.com/en-us/um/people/simonpj/papers/eval-apply/

百变从容 2024-12-27 23:27:44

一般来说,即使 Haskell 中的原始值(例如 Int 和 Float 类型)也由 thunk 表示。这确实是非严格语义所要求的;考虑以下片段:

bottom :: Int
bottom = div 1 0

仅当检查了 Bottom 的值时,此定义才会生成除零的异常,但如果从未使用过该值则不会。

现在考虑 add 函数:

add :: Int -> Int -> Int
add x y = x+y

add 的简单实现必须强制 x 的 thunk,强制 y 的 thunk,添加值并为结果创建一个(评估的)thunk。与严格的函数式语言(更不用说命令式语言)相比,这对于算术来说是巨大的开销。

然而,像 GHC 这样的优化编译器基本上可以避免这种开销;这是 GHC 如何转换 add 函数的简化视图:

add :: Int -> Int -> Int
add (I# x) (I# y) = case# (x +# y) of z -> I# z 

在内部,像 Int 这样的基本类型被视为具有单个构造函数的数据类型。 Int# 类型是整数的“原始”机器类型,+# 是原始类型的原始加法。
对原始类型的操作直接在位模式(例如寄存器)上实现——而不是重击。然后装箱和拆箱被翻译为构造函数应用和模式匹配。

这种方法的优点(在这个简单的示例中不可见)是编译器通常能够内联此类定义并删除中间装箱/拆箱操作,只留下最外面的操作。

In general, even primitive values in Haskell (e.g. of type Int and Float) are represented by thunks. This is indeed required by the non-strict semantics; consider the following fragment:

bottom :: Int
bottom = div 1 0

This definition will generate a div-by-zero exception only if the value of bottom is inspected, but not if the value is never used.

Consider now the add function:

add :: Int -> Int -> Int
add x y = x+y

A naive implementation of add must force the thunk for x, force the thunk for y, add the values and create an (evaluated) thunk for the result. This is a huge overhead for arithmetic compared to strict functional languages (not to mention imperative ones).

However, an optimizing compiler such as GHC can mostly avoid this overhead; this is a simplified view of how GHC translates the add function:

add :: Int -> Int -> Int
add (I# x) (I# y) = case# (x +# y) of z -> I# z 

Internally, basic types like Int is seen as datatype with a single constructor. The type Int# is the "raw" machine type for integers and +# is the primitive addition on raw types.
Operations on raw types are implemented directly on bit-patterns (e.g. registers) --- not thunks. Boxing and unboxing are then translated as constructor application and pattern matching.

The advantage of this approach (not visible in this simple example) is that the compiler is often capable of inlining such definitions and removing intermediate boxing/unboxing operations, leaving only the outermost ones.

肤浅与狂妄 2024-12-27 23:27:44

将每个值包装在 thunk 中是绝对正确的。但由于 Haskell 是非严格的,编译器可以选择何时评估 thunk/表达式。特别是,如果可以生成更好的代码,编译器可以选择比严格必要的时间更早地计算表达式。

优化 Haskell 编译器 (GHC) 执行严格性分析来找出,值将始终被计算。

一开始,编译器必须假设函数的任何参数都没有被使用过。然后它遍历函数体并尝试找到以下函数应用程序:1)已知其参数(至少某些)是严格的,2)始终必须进行评估以计算函数的结果。

在您的示例中,我们的函数 (+) 的参数都很严格。因此,编译器知道此时始终需要评估 xy
现在碰巧的是,表达式x+y始终是计算函数结果所必需的,因此编译器可以存储函数add在两者中都是严格的信息xy

因此,为 add* 生成的代码将期望整数值作为参数,而不是 thunk。当涉及递归(不动点问题)时,算法会变得更加复杂,但基本思想保持不变。

另一个例子:

mkList x y = 
    if x then y : []
         else []

此函数将采用求值形式的 x(作为布尔值)和 y 作为 thunk。表达式x需要通过mkList在每个可能的执行路径中进行计算,这样我们就可以让调用者对其进行计算。另一方面,表达式 y 永远不会在任何参数严格的函数应用程序中使用。 cons 函数 : 从不查看 y 它只是将其存储在列表中。因此 y 需要作为 thunk 传递以满足惰性 Haskell 语义。

mkList False undefined -- absolutely legal

*: add 当然是多态的,xy 的确切类型取决于实例化。

It would be absolutely correct to wrap every value in a thunk. But since Haskell is non-strict, compilers can choose when to evaluate thunks/expressions. In particular, compilers can choose to evaluate an expression earlier than strictly necessary, if it results in better code.

Optimizing Haskell compilers (GHC) perform Strictness analysis to figure out, which values will always be computed.

In the beginning, the compiler has to assume, that none of a function's arguments are ever used. Then it goes over the body of the function and tries to find functions applications that 1) are known to be strict in (at least some of) their arguments and 2) always have to be evaluated to compute the function's result.

In your example, we have the function (+) that is strict in both it's arguments. Thus the compiler knows that both x and y are always required to be evaluated at this point.
Now it just so happens, that the expression x+y is always necessary to compute the function's result, therefore the compiler can store the information that the function add is strict in both x and y.

The generated code for add* will thus expect integer values as parameters and not thunks. The algorithm becomes much more complicated when recursion is involved (a fixed point problem), but the basic idea remains the same.

Another example:

mkList x y = 
    if x then y : []
         else []

This function will take x in evaluated form (as a boolean) and y as a thunk. The expression x needs to be evaluated in every possible execution path through mkList, thus we can have the caller evaluate it. The expression y, on the other hand, is never used in any function application that is strict in it's arguments. The cons-function : never looks at y it just stores it in a list. Thus y needs to be passed as a thunk in order to satisfy the lazy Haskell semantics.

mkList False undefined -- absolutely legal

*: add is of course polymorphic and the exact type of x and y depends on the instantiation.

你如我软肋 2024-12-27 23:27:44

简短回答:是的。

长答案:

val = 5

这必须存储在一个 thunk 中,因为想象一下,如果我们在代码中任何地方编写这个代码(例如,在我们导入的库或其他东西中):

val = undefined

如果在我们的程序中必须对此进行评估启动,就会崩溃,对吗?如果我们实际上将该值用于某些东西,那将是我们想要的,但如果我们不使用它,它不应该对我们的程序产生如此灾难性的影响。

对于第二个例子,让我稍微改变一下:

div x y = x / y

这个值也必须存储在一个 thunk 中,因为想象一下这样的代码:

average list =
  if null list
     then 0
     else div (sum list) (length list)

如果 div 这里是严格的,即使当该列表为 null (又名空),这意味着像这样编写函数是行不通的,因为当给定以下值时,它没有机会返回 0空列表,即使这是我们在这种情况下想要的。

您的最后一个示例只是示例 1 的变体,并且出于相同的原因它必须是惰性的。

话虽如此,可以强制编译器使某些值严格,但这超出了这个问题的范围。

Short answer: Yes.

Long answer:

val = 5

This has to be stored in a thunk, because imagine if we wrote this anywhere in our code (like, in a library we imported or something):

val = undefined

If this has to be evaluated when our program starts, it would crash, right? If we actually use that value for something, that would be what we want, but if we don't use it, it shouldn't be able to influence our program so catastrophically.

For your second example, let me change it a little:

div x y = x / y

This value has to be stored in a thunk as well, because imagine some code like this:

average list =
  if null list
     then 0
     else div (sum list) (length list)

If div was strict here, it would be evaluated even when the list is null (aka. empty), meaning that writing the function like this wouldn't work, because it wouldn't have a chance to return 0 when given the empty list, even though this is what we would want in this case.

Your final example is just a variation of example 1, and it has to be lazy for the same reasons.

All this being said, it is possible to force the compiler to make some values strict, but that goes beyond the scope of this question.

埋情葬爱 2024-12-27 23:27:44

我认为其他人很好地回答了你的问题,但为了完整起见,让我补充一下,GHC 也为你提供了直接使用未装箱值的可能性。这是 Haskell Wiki 对此的说法

当您真的非常渴望速度,并且想要直接了解“原始位”时。有关使用未装箱类型的一些信息,请参阅 GHC 基元。

但是,这应该是最后的手段,因为未装箱的类型和基元是不可移植的。幸运的是,通常没有必要诉诸于使用显式的拆箱类型和原语,因为 GHC 的优化器可以通过内联它知道的操作以及拆箱严格的函数参数来为您完成工作。严格且解压的构造函数字段也有很大帮助。有时,GHC 需要一些帮助才能生成正确的代码,因此您可能需要查看核心输出,看看您的调整是否确实达到了预期的效果。

使用未装箱的类型和原语可以说的一件事是,您知道自己正在编写高效的代码,而不是依赖 GHC 的优化器来做正确的事情,并受到 GHC 优化器的更改的摆布线。这对您来说可能很重要,在这种情况下就去做吧。

如前所述,它是不可移植的,因此您需要 GHC 语言扩展。请参阅此处获取他们的文档。

I think the others answered your question nicely, but just for completeness's sake let me add that GHC offers you the possibility of using unboxed values directly as well. This is what Haskell Wiki says about it:

When you are really desperate for speed, and you want to get right down to the “raw bits.” Please see GHC Primitives for some information about using unboxed types.

This should be a last resort, however, since unboxed types and primitives are non-portable. Fortunately, it is usually not necessary to resort to using explicit unboxed types and primitives, because GHC's optimiser can do the work for you by inlining operations it knows about, and unboxing strict function arguments. Strict and unpacked constructor fields can also help a lot. Sometimes GHC needs a little help to generate the right code, so you might have to look at the Core output to see whether your tweaks are actually having the desired effect.

One thing that can be said for using unboxed types and primitives is that you know you're writing efficient code, rather than relying on GHC's optimiser to do the right thing, and being at the mercy of changes in GHC's optimiser down the line. This may well be important to you, in which case go for it.

As mentioned, it's non-portable, so you need a GHC language extension. See here for their docs.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文