如何在 Haskell 中使 CAF 不是 CAF?
如何将常量应用形式变成,而不是常量应用形式,以阻止它在程序的整个生命周期中保留?
我已经尝试过这种方法:
-- | Dummy parameter to avoid creating a CAF
twoTrues :: () -> [[[Bool]]]
twoTrues _ = map (++ (True : repeat False)) . trueBlock <$> [1..]
但它似乎不起作用 - 配置文件显示它仍然被保留并且仍然将其标记为 CAF。
我找到了一个与此相关的 Google 结果,回复作者:西蒙·佩顿·琼斯 (Simon Peyton-Jones) 写给尼尔·米切尔 (Neil Mitchell),他正是提出了这个问题 - 但不幸的是,这个答案指向了一个死链接。
How do I make a Constant Applicative Form into, well, not a Constant Applicative Form, to stop it being retained for the lifetime of the program?
I've tried this approach:
-- | Dummy parameter to avoid creating a CAF
twoTrues :: () -> [[[Bool]]]
twoTrues _ = map (++ (True : repeat False)) . trueBlock <gt; [1..]
but it doesn't seem to work - the profile shows it as still being retained and still marks it as a CAF.
I've found one relevant Google result on this, a reply by Simon Peyton-Jones to Neil Mitchell who asked precisely this question - but that answer refers to a dead link, unfortunately.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
一个完整的示例
这是一个显示情况的小示例:
看起来像一个函数,对吧?但GHC 是做什么的呢?它将枚举浮动到顶层!
哎呀!
那么我们能做什么呢?
关闭优化
这可行,
-Onot
,但不可取:不要内联,以及更多功能
一切 到一个函数中,包括
enumFromTo
,将参数传递给工作人员:现在我们终于摆脱了 CAF!即使有
-O2
耶。
什么不起作用?
关闭 -ffull-lazyness
完全惰性转换使定义向外浮动。默认情况下,
-O1
或以上版本处于打开状态。让我们尝试使用-fno-full-laziness
将其关闭。然而,这不起作用。A complete example
Here's a little example that shows the situation:
Looks like a function, right? But what does GHC do? It floats the enum to the top level!
Ooops!
So what can we do?
Turn off optimizations
That works,
-Onot
, but not desirable:Don't inline, and more functons
Make everything into a function, including the
enumFromTo
, plumbing the parameter through to the workers:Now we are finally CAF-free! Even with
-O2
Yay.
What doesn't work?
Turn off -ffull-laziness
The full laziness transformation floats definitions outwards. It is on by default with
-O1
or above. Let's try turning it off with-fno-full-laziness
. However, it doesn't work.概括。如果您有一个常数值,您可以将其概括为某个变量的函数吗?问题中函数的命名
twoTrues
立即表明该常量是序列zeroTrues
,oneTrue
,中的第三个TwoTrues
、ThreeTrues
等 - 确实如此。因此,将twoTrues
概括为一个函数nTrues
,该函数采用参数 n 并删除twoTrues
,将从程序。碰巧,在这种情况下,我只考虑了程序的
zeroTrues
、oneTrue
和twoTrues
情况,因为这就是我所需要的,但我的程序自然可以扩展到处理nTrues
forn
> 2 - 因此泛化到 nTrues 意味着对zeroTrues
、oneTrue
等的用户“一路泛化”是有意义的。情况并非总是如此。注意:可能仍然有其他 CAF 需要处理,无论是在代码中,还是由 GHC 的“优化”生成(在这些病态情况下,这并不是真正的优化)。
然而,这个答案可能需要程序员做比严格必要的更多的工作。正如唐的回答所示,实际上没有必要进行概括。
另一方面,在某些情况下,概括一个常量可以使您更清楚您实际在做什么,并有助于可重用性。它甚至可以揭示以更好的系统方式和/或更有效地计算一系列值的方法。
关于这种特殊情况的注释(可以忽略):在这种特殊情况下,我不想将
nTrues
本身 变成一个无限列表(这将是一个 CAF再次,重新引入原来的问题!)而不是函数。原因之一是,虽然twoTrues
可能以无限列表的形式有用,但我看不出它对于nTrues
有什么用处(无论如何,在我的应用程序中)以无限列表的形式。Generalise. If you have a constant value, can you generalise this to a function of some variable? The naming of my function in the question,
twoTrues
, immediately suggests that this constant is the third in a sequencezeroTrues
,oneTrue
,twoTrues
,threeTrues
etc. - and indeed it is. So generalisingtwoTrues
into a functionnTrues
which takes a parameter n and deletingtwoTrues
, would eliminate one CAF from the program.As it happens, in this case, I had only considered the cases
zeroTrues
,oneTrue
andtwoTrues
for my program because that was all I needed, but my program could naturally be extended to deal withnTrues
forn
> 2 - so generalising tonTrues
would mean it would make sense to "generalise all the way up" to the users ofzeroTrues
,oneTrue
etc. That would not always be the case.Note: there might still be other CAFs to deal with, either in the code, or produced by GHC's "optimisations" (which are not really optimisations in these pathological cases).
This answer may involve more work by the programmer than is strictly necessary, however. It isn't actually necessary to generalise, as Don's answer shows.
On the other hand, in some cases, generalising a constant can make it more clear what you are actually doing, and aid reusability. It can even reveal ways to compute a series of values in a better systematic way, and/or more efficiently.
A note about this particular case (which can be ignored): In this particular case, I would not want to make
nTrues
itself into an infinite list (which would be a CAF again, reintroducing the original problem!) rather than a function. One reason is that whiletwoTrues
could be useful in the form of an infinite list, I can't see how it would be useful (in my application, anyway) fornTrues
to be in the form of an infinite list.这似乎是一个长期存在的问题 http://hackage.haskell.org/trac/ ghc/ticket/917 。在我看来,这是一个关键的问题。
It seems to be a long-standing problem http://hackage.haskell.org/trac/ghc/ticket/917 . And in my opinion a critical one.
引入虚拟参数后,您还必须使其看起来像结果实际上取决于该参数。不然的话,GHC的聪明可能又会把它变成CAF了。
我建议如下:
With the introduction of a dummy parameter, you also have to make it look like the result actually depends on the parameter. Otherwise, GHC's cleverness might turn it into a CAF again.
I suggest the following:
您需要向优化器隐藏 rhs 是 CAF 的事实。
像这样的事情应该可以做到。
You need to hide the fact that the rhs is a CAF from the optimizer.
Something like this should do it.
最简单的解决方案可能是告诉编译器内联它。 (注意:这个答案未经测试。如果它不起作用,请在下面评论。)
即使(假设)编译器由于某种原因拒绝内联它,您也可以使用
cpp
代替,通过 #定义它:(当然,尽管使用这种方法,它不会出现在模块的界面中,因此您只能在该模块中使用它)。
-cpp
选项告诉 GHC 使用 cpp 预处理输入文件。如果您在 n>1 个地方引用
twoTrues
,内联就意味着复制代码 n 次。然而,虽然这在理论上是不可取的,但在实践中可能不会成为一个重大问题。The simplest possible solution might be to tell the compiler to inline it. (Note: this answer is untested. If it doesn't work, please comment below.)
Even if (hypothetically) the compiler refuses to inline it for some reason, you can use
cpp
instead, by #defining it:(although with that approach, of course, it won't appear in the module's interface, so you can only use it within that module).
The
-cpp
option tells GHC to preprocess the input file with cpp.Inlining would mean duplicating the code n times if you refer to
twoTrues
in n>1 places. However, while that's theoretically undesirable, it's probably not going to be a significant problem in practice.每当你使用
()
作为参数时,你要说的实际上是你对它不感兴趣,因为
()
根本没有任何有趣的东西;你不会用它做任何事情,因为你不能用()
做任何事情。问题在于编译器有权对其进行优化,因为只有一个可能的值可以传递,因此它的使用总是可预测的,所以为什么不假设它呢?但它把它移回 CAF 并使得这个想法行不通。
幸运的是,还有另一种方法可以做到这一点。看一下
twoTrues
的以下修改:现在您可以像这样使用
twoTrues
:由于
a
是未使用的类型参数,因此调用者可以传递任何内容。因为你不知道它会是什么,所以你不知道你可以用它做什么。这实际上是在迫使你忽略它的价值。所以它基本上声明了我之前提到的相同声明。当然,您现在可以将任何内容(包括
undefined
)传递给该函数。但这并不重要,实际上正是这种可能性使这个技巧变得可行,因为编译器无法再预测如何使用这个函数。当人类用户看到这个函数时,他们应该知道你要在这里说什么,并得出结论,传递()
是最简单的,但即使他们不这样做并传递其他东西,它也不会破坏任何东西由于 Haskell 是惰性的,因此根本无法计算附加参数。那么如果使用
()
作为结果会怎样呢?这更糟糕。由于返回()
意味着你的函数根本不做任何事情(在 Haskell 中,函数的所有效果都应在其返回值中表示),编译器有权断定你的函数是不必要的。结论是,
()
作为类型不应出现在类型签名中,除非与其他类型一起使用(即在IO ()
中)。编辑
现在人们可能想知道,是否只有一种方法可以实现
a -> String
来自String
,为什么编译器无法断定它们是相同的。答案是你实际上有两种方法来实现这一点。对于几乎所有输入,
通常值=异常值
,但通常未定义
是“Hello World!”
,而异常未定义
> 是未定义
。从人类的角度来看,
unusual
是非常不寻常的,因为它强制评估与最终结果无关的值。如果在任何情况下您确实需要这样的东西,只需调用 seq 就会更容易。此外,由于 Haskell 默认是惰性的,如果你想定义一个严格的函数,你最好记录这个行为。因此,如果您在没有附加文档的情况下看到这样的签名,您有权假设它是以通常
方式实现的。Whenever you use
()
as a parameter, what you are going to say is actuallyYou are not interesting in it, because
()
does not have anything interesting at all; you are not going to do anything with it, because you can do nothing with()
.The problem of it is that the compiler have the right to optimise it out since there is only one possible value to pass so its use is always predictable and so why not assume it? But it moves it back to CAF and makes the idea does not work.
Fortunately, there is another way to do so. Look at the following modification of
twoTrues
:Now you can use
twoTrues
like this:Since
a
is an unused type parameter, the caller can pass anything. And because you don't know what it would be so you have no idea what you can do with it. This is actually forcing you to ignore its value. So it is basically declaring the same statement I mentioned before.Of cause, you can now pass any thing (including
undefined
) to that function. But it does not matter and actually it is this possibility makes this trick workable, since the compiler can no longer predict how this function being used. When the human user sees this function they should know what you are going to say here and conclude passing()
is the easiest, but even if they don't and passing something else, it would not break anything and since Haskell is lazy the additional parameter could never be evaluated at all.So what if
()
being used as a result? This is even worse. Since returning()
means your function do not do anything at all (in Haskell, all effects of a function shall represented in its return value), the compiler have the right to conclude your function is not necessary.The conclusion is,
()
as a type shall not appear in type signatures unless used with some other type (i.e. inIO ()
).EDIT
Now one may wonder, if there is only one way to implement
a -> String
from aString
, why the compiler cannot conclude they are the same. The answer turn out to be that you actually have two way to implement this.For almost all input,
usual value = unusual value
, butusual undefined
is"Hello World!"
whilstunusual undefined
isundefined
.In a human point of view,
unusual
is pretty unusual, because it forces to evaluate a value unrelated to the final result. If in any case you do need such a thing, simply callseq
would be easier. Furthermore, since Haskell is by default lazy, if you want to define a function that is strict, you'd better document this behavior. So if you see such a signature without additional documentation you have the right to assume it is implemented in theusual
way.