数据类型中严格字段的​​优点

发布于 2024-12-22 06:10:27 字数 237 浏览 2 评论 0原文

现在这可能有点模糊,但我已经想知道这一点有一段时间了。据我所知,使用 !,可以确保在构造值之前对数据构造函数的参数进行求值:

data Foo = Bar !Int !Float

我经常认为懒惰是一件好事。现在,当我浏览源代码时,我发现严格字段比 !-less 变体更常见。

这样做有什么好处?为什么我不应该让它保持懒惰呢?

This may now be a bit fuzzy, but I've been wondering that for a while. To my knowledge with !, one can make sure a parameter for a data constructor is being evaluated before the value is constructed:

data Foo = Bar !Int !Float

I have often thought that laziness is a great thing. Now, when I go through sources, I see strict fields more often than the !-less variant.

What is the advantage of this and why shouldn't I leave it lazy as it is?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

北渚 2024-12-29 06:10:27

除非您在 Int 和 Float 字段中存储大量计算,否则在 thunk 中构建的大量琐碎计算可能会产生大量开销。例如,如果您重复将 1 添加到数据类型中的惰性 Float 字段,它将占用越来越多的内存,直到您实际强制该字段并计算它。

通常,您希望在字段中存储昂贵的计算。但是,如果您知道自己不会提前做类似的事情,则可以将该字段标记为严格,并避免在任何地方手动添加 seq 以获得您想要的效率。

作为额外的好处,当给定标志 -funbox-strict-fields 时,GHC 会将数据类型的严格字段1直接解压到数据类型本身中,这是可能的,因为它知道它们总是会被评估,因此不需要分配 thunk ;在这种情况下,Bar 值将直接在内存中的 Bar 值内包含由 Int 和 Float 组成的机器字,而不是包含两个指向包含数据的 thunk 的指针。

惰性是一件非常有用的事情,但有时,它只会妨碍计算,特别是对于总是被查看(因此被迫)的小字段,或者经常修改但从不进行非常昂贵的计算的小字段。严格字段有助于克服这些问题,而无需修改数据类型的所有用途。

它是否比惰性字段更常见取决于您正在阅读的代码类型;例如,您不太可能看到任何功能树结构广泛使用严格字段,因为它们从惰性中受益匪浅。

假设您有一个带有用于中缀操作的构造函数的 AST:

data Exp = Infix Op Exp Exp
         | ...

data Op = Add | Subtract | Multiply | Divide

您不想使 Exp 字段变得严格,因为应用这样的策略意味着每当您查看时都会评估整个 AST顶层节点,这显然不是你想要从懒惰中获益的。然而,Op 字段永远不会包含您想要推迟到以后的昂贵计算,并且如果您有真正深层嵌套的解析,每个中缀运算符的 thunk 开销可能会变得昂贵树。因此,对于中缀构造函数,您希望使 Op 字段严格,但将两个 Exp 字段保留为惰性。

1 只能解包单构造函数类型。

Unless you're storing a large computation in the Int and Float fields, significant overhead can build up from lots of trivial computations building up in thunks. For instance, if you repeatedly add 1 to a lazy Float field in a data type, it will use up more and more memory until you actually force the field, calculating it.

Often, you want to store to expensive computation in a field. But if you know you won't be doing anything like that ahead of time, you can mark the field strict, and avoid having to manually add seq everywhere to get the efficiency you desire.

As an additional bonus, when given the flag -funbox-strict-fields GHC will unpack strict fields1 of data types directly into the data type itself, which is possible since it knows they will always be evaluated, and thus no thunk has to be allocated; in this case, a Bar value would contain the machine words comprising the Int and Float directly inside the Bar value in memory, rather than containing two pointers to thunks which contain the data.

Laziness is a very useful thing, but some of the time, it just gets in the way and impedes computation, especially for small fields that are always looked at (and thus forced), or that are modified often but never with very expensive computations. Strict fields help overcome these issues without having to modify all uses of the data type.

Whether it's more common than lazy fields or not depends on the type of code you're reading; you aren't likely to see any functional tree structures use strict fields extensively, for instance, because they benefit greatly from laziness.

Let's say you have an AST with a constructor for infix operations:

data Exp = Infix Op Exp Exp
         | ...

data Op = Add | Subtract | Multiply | Divide

You wouldn't want to make the Exp fields strict, as applying a policy like that would mean that the entire AST is evaluated whenever you look at the top-level node, which is clearly not what you want to benefit from laziness. However, the Op field is never going to contain an expensive computation that you want to defer to a later date, and the overhead of a thunk per infix operator might get expensive if you have really deeply-nested parse trees. So for the infix constructor, you'd want to make the Op field strict, but leave the two Exp fields lazy.

1 Only single-constructor types can be unpacked.

故人如初 2024-12-29 06:10:27

除了其他答案提供的信息之外,请记住:

据我所知,! 可以确保在构造值之前评估数据构造函数的参数

看看 深入评估参数 - 就像 seq$! 评估为WHNF

给定数据类型

data Foo = IntFoo !Int | FooFoo !Foo | BarFoo !Bar
data Bar = IntBar Int

,计算为 WHNF 的表达式

let x' = IntFoo $ 1 + 2 + 3
in  x'

将生成值 IntFoo 6(== 完全计算,== NF)。
此外,此表达式

let x' = FooFoo $ IntFoo $ 1 + 2 + 3
in  x'

计算为 WHNF 会生成值 FooFoo (IntFoo 6)(== 完全计算,== NF)。
但是,此表达式

let x' = BarFoo $ IntBar $ 1 + 2 + 3
in  x'

计算为 WHNF 会生成值 BarFoo (IntBar (1 + 2 + 3)) (!= 完全计算,!= NF)。

要点:如果 Bar 的数据构造函数本身不包含严格的参数,那么 !Bar 参数的严格性不一定有帮助。

In addition to the information provided by other answers, keep in mind:

To my knowledge with !, one can make sure a parameter for a data constructor is being evaluated before the value is constructed

It's interesting to look at how deep the parameter is evaluated - it's like with seq and $! evaluated to WHNF.

Given the datatypes

data Foo = IntFoo !Int | FooFoo !Foo | BarFoo !Bar
data Bar = IntBar Int

the expression

let x' = IntFoo $ 1 + 2 + 3
in  x'

evaluated to WHNF produces value IntFoo 6 (== fully evaluated, == NF).
Additionally this expression

let x' = FooFoo $ IntFoo $ 1 + 2 + 3
in  x'

evaluated to WHNF produces value FooFoo (IntFoo 6) (== fully evaluated, == NF).
However, this expression

let x' = BarFoo $ IntBar $ 1 + 2 + 3
in  x'

evaluated to WHNF produces value BarFoo (IntBar (1 + 2 + 3)) (!= fully evaluated, != NF).

Main point: The strictness of the !Bar parameter won't necessarily help if the data constructors of Bar don't contain strict parameters themselves.

烟花肆意 2024-12-29 06:10:27

惰性会带来一定的开销——编译器必须为值创建一个 thunk 来存储计算,直到需要结果为止。如果您知道迟早总会需要结果,那么强制对结果进行评估是有意义的。

There is an overhead associated with lazyness — the compiler has to create a thunk for the value to store the computation until the result is needed. If you know that you'll always need the result sooner or later, then it can make sense to force the evaluation of the result.

要走干脆点 2024-12-29 06:10:27

懒惰是有代价的,否则每种语言都会有它。

成本是两倍:

  1. 设置 thunk(即最终要计算时必须计算的内容的描述)可能比立即执行操作花费更长的时间。
  2. 未评估的 thunk 作为其他 thunk 的非严格参数,作为其他 thunk 的非严格参数等,将使用越来越多的内存。不幸的是,这些 tunk 还可能保存对不再可访问的内存的引用,即当仅评估 thunk 时可以释放的内存,从而阻止垃圾收集器完成其工作。一个例子是一个应该更新树中某个值的 thunk。假设该值包含 100MB 的其他值。如果不再有对旧树的引用,只要不评估 thunk,该内存就会被浪费。

Lazyness comes at a cost, otherwise every language would have it.

The cost is 2-fold:

  1. It may take longer to set up the thunk (i.e. the description of what has to be computed when it is going to be computed eventually) than to do the operation right away.
  2. Unevaluated thunks that go as non strict arguments to other thunks that go as non strict arguments to yet other thunks etc. will use more and more memory. Unfortunately, those tunks may also hold references to memory that is not accessible anymore, i.e. memory that could be freed when only the thunk would be evaluated, thus preventing the garbage collector to do its work. An example would be a thunk that is supposed to update a certain value in the tree. Say this value holds on 100MB worth of other values. If there is no reference to the old tree anymore, this memory is wasted as long as the thunk is not evaluated.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文