如何在 Haskell 中使 CAF 不是 CAF?

发布于 2024-11-08 20:31:16 字数 527 浏览 5 评论 0原文

如何将常量应用形式变成,而不是常量应用形式,以阻止它在程序的整个生命周期中保留?

我已经尝试过这种方法:

-- | Dummy parameter to avoid creating a CAF
twoTrues :: () -> [[[Bool]]]
twoTrues _ = map (++ (True : repeat False)) . trueBlock <$> [1..]

但它似乎不起作用 - 配置文件显示它仍然被保留并且仍然将其标记为 CAF。

我找到了一个与此相关的 Google 结果,回复作者:西蒙·佩顿·琼斯 (Simon Peyton-Jones) 写给尼尔·米切尔 (Neil Mitchell),他正是提出了这个问题 - 但不幸的是,这个答案指向了一个死链接。

How do I make a Constant Applicative Form into, well, not a Constant Applicative Form, to stop it being retained for the lifetime of the program?

I've tried this approach:

-- | Dummy parameter to avoid creating a CAF
twoTrues :: () -> [[[Bool]]]
twoTrues _ = map (++ (True : repeat False)) . trueBlock <
gt; [1..]

but it doesn't seem to work - the profile shows it as still being retained and still marks it as a CAF.

I've found one relevant Google result on this, a reply by Simon Peyton-Jones to Neil Mitchell who asked precisely this question - but that answer refers to a dead link, unfortunately.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

故人的歌 2024-11-15 20:31:16

一个完整的示例

这是一个显示情况的小示例:

module A where

big :: () -> [Int]
big _ = [1..10^7]

看起来像一个函数,对吧?但GHC 是做什么的呢?它将枚举浮动到顶层!

A.big1 :: [Int]
[ Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, Cheap=False, Expandable=False,
         Guidance=IF_ARGS [] 7 0}]
A.big1 =
  case A.$wf1 10 A.big2 of ww_sDD { __DEFAULT ->
  eftInt 1 ww_sDD
  }

A.big :: () -> [Int]
[Arity=1,    
 Unf=Unf{Src=InlineStable, TopLvl=True, Arity=1, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=ALWAYS_IF(unsat_ok=True,boring_ok=True)
         Tmpl= \ _ -> A.big1}]
A.big = \ _ -> A.big1

哎呀!


那么我们能做什么呢?

关闭优化

这可行,-Onot,但不可取:

A.big :: () -> [Int]
[GblId, Arity=1]
A.big =
  \ _ ->
    enumFromTo
      @ Int
      $fEnumInt
      (I# 1)
      (^
         @ Int
         @ Type.Integer
         $fNumInt
         $fIntegralInteger
         (I# 10)
         (smallInteger 7))

不要内联,以及更多功能

一切 到一个函数中,包括 enumFromTo,将参数传递给工作人员:

big :: () -> [Int]
big u = myEnumFromTo u 1 (10^7)
{-# NOINLINE big #-}

myEnumFromTo :: () -> Int -> Int -> [Int]
myEnumFromTo _ n m = enumFromTo n m
{-# NOINLINE myEnumFromTo #-}

现在我们终于摆脱了 CAF!即使有 -O2

A.myEnumFromTo [InlPrag=NOINLINE]
  :: () -> Int -> Int -> [Int]
A.myEnumFromTo =
  \ _ (n_afx :: Int) (m_afy :: Int) ->
    $fEnumInt_$cenumFromTo n_afx m_afy

A.big [InlPrag=NOINLINE] :: () -> [Int]
A.big = \ (u_abx :: ()) -> A.myEnumFromTo u_abx A.$s^2 lvl3_rEe

耶。


什么不起作用?

关闭 -ffull-lazyness

完全惰性转换使定义向外浮动。默认情况下,-O1 或以上版本处于打开状态。让我们尝试使用 -fno-full-laziness 将其关闭。然而,这不起作用。

A complete example

Here's a little example that shows the situation:

module A where

big :: () -> [Int]
big _ = [1..10^7]

Looks like a function, right? But what does GHC do? It floats the enum to the top level!

A.big1 :: [Int]
[ Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, Cheap=False, Expandable=False,
         Guidance=IF_ARGS [] 7 0}]
A.big1 =
  case A.$wf1 10 A.big2 of ww_sDD { __DEFAULT ->
  eftInt 1 ww_sDD
  }

A.big :: () -> [Int]
[Arity=1,    
 Unf=Unf{Src=InlineStable, TopLvl=True, Arity=1, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=ALWAYS_IF(unsat_ok=True,boring_ok=True)
         Tmpl= \ _ -> A.big1}]
A.big = \ _ -> A.big1

Ooops!


So what can we do?

Turn off optimizations

That works, -Onot, but not desirable:

A.big :: () -> [Int]
[GblId, Arity=1]
A.big =
  \ _ ->
    enumFromTo
      @ Int
      $fEnumInt
      (I# 1)
      (^
         @ Int
         @ Type.Integer
         $fNumInt
         $fIntegralInteger
         (I# 10)
         (smallInteger 7))

Don't inline, and more functons

Make everything into a function, including the enumFromTo, plumbing the parameter through to the workers:

big :: () -> [Int]
big u = myEnumFromTo u 1 (10^7)
{-# NOINLINE big #-}

myEnumFromTo :: () -> Int -> Int -> [Int]
myEnumFromTo _ n m = enumFromTo n m
{-# NOINLINE myEnumFromTo #-}

Now we are finally CAF-free! Even with -O2

A.myEnumFromTo [InlPrag=NOINLINE]
  :: () -> Int -> Int -> [Int]
A.myEnumFromTo =
  \ _ (n_afx :: Int) (m_afy :: Int) ->
    $fEnumInt_$cenumFromTo n_afx m_afy

A.big [InlPrag=NOINLINE] :: () -> [Int]
A.big = \ (u_abx :: ()) -> A.myEnumFromTo u_abx A.$s^2 lvl3_rEe

Yay.


What doesn't work?

Turn off -ffull-laziness

The full laziness transformation floats definitions outwards. It is on by default with -O1 or above. Let's try turning it off with -fno-full-laziness. However, it doesn't work.

哑剧 2024-11-15 20:31:16

概括。如果您有一个常数值,您可以将其概括为某个变量的函数吗?问题中函数的命名 twoTrues 立即表明该常量是序列 zeroTrues, oneTrue, 中的第三个TwoTruesThreeTrues 等 - 确实如此。因此,将 twoTrues 概括为一个函数 nTrues ,该函数采用参数 n 并删除 twoTrues,将从程序。

碰巧,在这种情况下,我只考虑了程序的 zeroTruesoneTruetwoTrues 情况,因为这就是我所需要的,但我的程序自然可以扩展到处理 nTrues for n > 2 - 因此泛化到 nTrues 意味着对 zeroTruesoneTrue 等的用户“一路泛化”是有意义的。情况并非总是如此。

注意:可能仍然有其他 CAF 需要处理,无论是在代码中,还是由 GHC 的“优化”生成(在这些病态情况下,这并不是真正的优化)。

然而,这个答案可能需要程序员做比严格必要的更多的工作。正如唐的回答所示,实际上没有必要进行概括。

另一方面,在某些情况下,概括一个常量可以使您更清楚您实际在做什么,并有助于可重用性。它甚至可以揭示以更好的系统方式和/或更有效地计算一系列值的方法。

关于这种特殊情况的注释(可以忽略):在这种特殊情况下,我不想将 nTrues 本身 变成一个无限列表(这将是一个 CAF再次,重新引入原来的问题!)而不是函数。原因之一是,虽然 twoTrues 可能以无限列表的形式有用,但我看不出它对于 nTrues 有什么用处(无论如何,在我的应用程序中)以无限列表的形式。

Generalise. If you have a constant value, can you generalise this to a function of some variable? The naming of my function in the question, twoTrues, immediately suggests that this constant is the third in a sequence zeroTrues, oneTrue, twoTrues, threeTrues etc. - and indeed it is. So generalising twoTrues into a function nTrues which takes a parameter n and deleting twoTrues, would eliminate one CAF from the program.

As it happens, in this case, I had only considered the cases zeroTrues, oneTrue and twoTrues for my program because that was all I needed, but my program could naturally be extended to deal with nTrues for n > 2 - so generalising to nTrues would mean it would make sense to "generalise all the way up" to the users of zeroTrues, oneTrue etc. That would not always be the case.

Note: there might still be other CAFs to deal with, either in the code, or produced by GHC's "optimisations" (which are not really optimisations in these pathological cases).

This answer may involve more work by the programmer than is strictly necessary, however. It isn't actually necessary to generalise, as Don's answer shows.

On the other hand, in some cases, generalising a constant can make it more clear what you are actually doing, and aid reusability. It can even reveal ways to compute a series of values in a better systematic way, and/or more efficiently.

A note about this particular case (which can be ignored): In this particular case, I would not want to make nTrues itself into an infinite list (which would be a CAF again, reintroducing the original problem!) rather than a function. One reason is that while twoTrues could be useful in the form of an infinite list, I can't see how it would be useful (in my application, anyway) for nTrues to be in the form of an infinite list.

过期以后 2024-11-15 20:31:16

这似乎是一个长期存在的问题 http://hackage.haskell.org/trac/ ghc/ticket/917 。在我看来,这是一个关键的问题。

It seems to be a long-standing problem http://hackage.haskell.org/trac/ghc/ticket/917 . And in my opinion a critical one.

冷了相思 2024-11-15 20:31:16

引入虚拟参数后,您还必须使其看起来像结果实际上取决于该参数。不然的话,GHC的聪明可能又会把它变成CAF了。

我建议如下:

twoTrues u = map (++ (True : repeat False)) . trueBlock <
gt; [(u `seq` 1)..]

With the introduction of a dummy parameter, you also have to make it look like the result actually depends on the parameter. Otherwise, GHC's cleverness might turn it into a CAF again.

I suggest the following:

twoTrues u = map (++ (True : repeat False)) . trueBlock <
gt; [(u `seq` 1)..]
终遇你 2024-11-15 20:31:16

您需要向优化器隐藏 rhs 是 CAF 的事实。
像这样的事情应该可以做到。

twoTrues :: () -> [[[Bool]]]
twoTrues u = map (++ (True : repeat (false u))) . trueBlock <
gt; [1..]

{-# NOINLINE false #-}
false :: () -> Bool
false _ = False

You need to hide the fact that the rhs is a CAF from the optimizer.
Something like this should do it.

twoTrues :: () -> [[[Bool]]]
twoTrues u = map (++ (True : repeat (false u))) . trueBlock <
gt; [1..]

{-# NOINLINE false #-}
false :: () -> Bool
false _ = False
提笔落墨 2024-11-15 20:31:16

最简单的解决方案可能是告诉编译器内联它。 (注意:这个答案未经测试。如果它不起作用,请在下面评论。)

即使(假设)编译器由于某种原因拒绝内联它,您也可以使用 cpp 代替,通过 #定义它:(

#define twoTrues (map (++ (True : repeat False)) . trueBlock <
gt; [1..])

当然,尽管使用这种方法,它不会出现在模块的界面中,因此您只能在该模块中使用它)。

-cpp 选项告诉 GHC 使用 cpp 预处理输入文件。

如果您在 n>1 个地方引用 twoTrues,内联就意味着复制代码 n 次。然而,虽然这在理论上是不可取的,但在实践中可能不会成为一个重大问题。

The simplest possible solution might be to tell the compiler to inline it. (Note: this answer is untested. If it doesn't work, please comment below.)

Even if (hypothetically) the compiler refuses to inline it for some reason, you can use cpp instead, by #defining it:

#define twoTrues (map (++ (True : repeat False)) . trueBlock <
gt; [1..])

(although with that approach, of course, it won't appear in the module's interface, so you can only use it within that module).

The -cpp option tells GHC to preprocess the input file with cpp.

Inlining would mean duplicating the code n times if you refer to twoTrues in n>1 places. However, while that's theoretically undesirable, it's probably not going to be a significant problem in practice.

你的笑 2024-11-15 20:31:16

每当你使用 () 作为参数时,你要说的实际上是

虽然我在这里声明了一个参数,但我对它是什么并不感兴趣,并且我不会对其值做任何事情。

你对它不感兴趣,因为 () 根本没有任何有趣的东西;你不会用它做任何事情,因为你不能用 () 做任何事情。

问题在于编译器有权对其进行优化,因为只有一个可能的值可以传递,因此它的使用总是可预测的,所以为什么不假设它呢?但它把它移回 CAF 并使得这个想法行不通。

幸运的是,还有另一种方法可以做到这一点。看一下 twoTrues 的以下修改:

twoTrues :: a -> [[[Bool]]]
twoTrues _ = map (++ (True : repeat False)) . trueBlock <
gt; [1..]

现在您可以像这样使用 twoTrues

map concat $ twoTrues()

由于 a 是未使用的类型参数,因此调用者可以传递任何内容。因为你不知道它会是什么,所以你不知道你可以用它做什么。这实际上是在迫使你忽略它的价值。所以它基本上声明了我之前提到的相同声明。

当然,您现在可以将任何内容(包括 undefined)传递给该函数。但这并不重要,实际上正是这种可能性使这个技巧变得可行,因为编译器无法再预测如何使用这个函数。当人类用户看到这个函数时,他们应该知道你要在这里说什么,并得出结论,传递 () 是最简单的,但即使他们不这样做并传递其他东西,它也不会破坏任何东西由于 Haskell 是惰性的,因此根本无法计算附加参数。

那么如果使用 () 作为结果会怎样呢?这更糟糕。由于返回 () 意味着你的函数根本不做任何事情(在 Haskell 中,函数的所有效果都应在其返回值中表示),编译器有权断定你的函数是不必要的。

结论是,() 作为类型不应出现在类型签名中,除非与其他类型一起使用(即在 IO () 中)。

编辑

现在人们可能想知道,是否只有一种方法可以实现 a -> String 来自 String,为什么编译器无法断定它们是相同的。答案是你实际上有两种方法来实现这一点。

usual :: a -> String
usual _ = "Hello World!"

unusual :: a -> String
unusual a = seq a "Hello World!"

对于几乎所有输入,通常值=异常值,但通常未定义“Hello World!”,而异常未定义 > 是未定义

从人类的角度来看,unusual 是非常不寻常的,因为它强制评估与最终结果无关的值。如果在任何情况下您确实需要这样的东西,只需调用 seq 就会更容易。此外,由于 Haskell 默认是惰性的,如果你想定义一个严格的函数,你最好记录这个行为。因此,如果您在没有附加文档的情况下看到这样的签名,您有权假设它是以通常方式实现的。

Whenever you use () as a parameter, what you are going to say is actually

Although I have declared a parameter here, I am not interesting in what it is and I am not going to do anything with its value.

You are not interesting in it, because () does not have anything interesting at all; you are not going to do anything with it, because you can do nothing with ().

The problem of it is that the compiler have the right to optimise it out since there is only one possible value to pass so its use is always predictable and so why not assume it? But it moves it back to CAF and makes the idea does not work.

Fortunately, there is another way to do so. Look at the following modification of twoTrues:

twoTrues :: a -> [[[Bool]]]
twoTrues _ = map (++ (True : repeat False)) . trueBlock <
gt; [1..]

Now you can use twoTrues like this:

map concat $ twoTrues()

Since a is an unused type parameter, the caller can pass anything. And because you don't know what it would be so you have no idea what you can do with it. This is actually forcing you to ignore its value. So it is basically declaring the same statement I mentioned before.

Of cause, you can now pass any thing (including undefined) to that function. But it does not matter and actually it is this possibility makes this trick workable, since the compiler can no longer predict how this function being used. When the human user sees this function they should know what you are going to say here and conclude passing () is the easiest, but even if they don't and passing something else, it would not break anything and since Haskell is lazy the additional parameter could never be evaluated at all.

So what if () being used as a result? This is even worse. Since returning () means your function do not do anything at all (in Haskell, all effects of a function shall represented in its return value), the compiler have the right to conclude your function is not necessary.

The conclusion is, () as a type shall not appear in type signatures unless used with some other type (i.e. in IO ()).

EDIT

Now one may wonder, if there is only one way to implement a -> String from a String, why the compiler cannot conclude they are the same. The answer turn out to be that you actually have two way to implement this.

usual :: a -> String
usual _ = "Hello World!"

unusual :: a -> String
unusual a = seq a "Hello World!"

For almost all input, usual value = unusual value, but usual undefined is "Hello World!" whilst unusual undefined is undefined.

In a human point of view, unusual is pretty unusual, because it forces to evaluate a value unrelated to the final result. If in any case you do need such a thing, simply call seq would be easier. Furthermore, since Haskell is by default lazy, if you want to define a function that is strict, you'd better document this behavior. So if you see such a signature without additional documentation you have the right to assume it is implemented in the usual way.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文