Haskell 函数定义和缓存数组

发布于 2024-07-09 09:06:33 字数 523 浏览 13 评论 0原文

我有一个关于在 Haskell 中使用数组实现缓存（记忆）的问题。以下模式有效：

f = (fA !)
  where fA = listArray...

但这不起作用（程序的速度表明每次调用或其他操作都会重新创建数组）：

f n = (fA ! n)
  where fA = listArray...

在 where 子句之外（在“全局范围”中）定义 fA 也适用于任一模式。

我希望有人能指出我对上述两种模式之间的区别的技术解释。

请注意，我使用的是最新的 GHC，我不确定这是否只是编译器的特性还是语言本身的一部分。

编辑：！用于数组访问，所以 fA ! 5 在 C++ 语法中表示 fA[5]。我对 Haskell 的理解是 (fA !) n 与 (fA ! n) 相同......而且对我来说写“fn = fA ! n”（不带括号）会更传统。无论如何，无论我如何添加括号，我都会得到相同的行为。

原文

I have a question about implementing caching (memoization) using arrays in Haskell. The following pattern works:

f = (fA !)
  where fA = listArray...

But this does not (the speed of the program suggests that the array is getting recreated each call or something):

f n = (fA ! n)
  where fA = listArray...

Defining fA outside of a where clause (in "global scope") also works with either pattern.

I was hoping that someone could point me towards a technical explanation of what the difference between the above two patterns is.

Note that I am using the latest GHC, and I'm not sure if this is just a compiler peculiarity or part of the language itself.

EDIT: ! is used for array access, so fA ! 5 means fA[5] in C++ syntax. My understanding of Haskell is that (fA !) n would be the same as (fA ! n)...also it would have been more conventional for me to have written "f n = fA ! n" (without the parentheses). Anyway, I get the same behaviour no matter how I parenthesize.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

旧伤慢歌 2024-07-16 09:06:33

Haskell 标准未指定行为差异。它所要说的是功能是相同的（给定相同的输入将导致相同的输出）。

然而，在这种情况下，有一种大多数编译器都遵循的简单方法来预测时间和内存性能。我再次强调，这不是必需的，只是大多数编译器都会这样做。

首先将两个示例重写为纯 lambda 表达式，展开以下部分：

f = let fA = listArray ... in \n -> fA ! n
f' = \n -> let fA = listArray ... in fA ! n

编译器使用 let 绑定来指示共享。保证是在给定的环境（局部变量集、lambda 体等）中，不带参数的 let 绑定的右侧最多将被评估一次。前者的 fA 环境是整个程序，因为它不在任何 lambda 下，但后者的环境较小，因为它在 lambda 下。

这意味着在后者中，fA可以为每个不同的 n 计算一次，而在前者中这是被禁止的。

即使对于多参数函数，我们也可以看到这种模式的效果：

g x y = (a ! y) where a = [ x ^ y' | y' <- [0..] ]
g' x = (\y -> a ! y) where a = [ x ^ y' | y' <- [0..] ]

Then in：

let k = g 2 in k 100 + k 100

我们可能会多次计算 2^100，但 in：

let k = g' 2 in k 100 + k 100

我们只会计算一次。

如果你正在做记忆工作，我推荐 Hackage 上的 data-memocombinators，这是一个不同形状的备忘录表库，所以你不必自己动手。

The difference in behavior is not specified by the Haskell standard. All it has to say is that the functions are the same (will result in the same output given the same input).

However in this case there is a simple way to predict time and memory performance that most compilers adhere to. Again I stress that this is not essential, only that most compilers do it.

First rewrite your two examples as pure lambda expressions, expanding the section:

f = let fA = listArray ... in \n -> fA ! n
f' = \n -> let fA = listArray ... in fA ! n

Compilers use let binding to indicate sharing. The guarantee is that in a given environment (set of local variables, lambda body, something like that), the right side of a let binding with no parameters will be evaluated at most once. The environment of fA in the former is the whole program since it is not under any lambda, but the environment of the latter is smaller since it is under a lambda.

What this means is that in the latter, fA may be evaluated once for each different n, whereas in the former this is forbidden.

We can see this pattern in effect even with multi argument functions:

g x y = (a ! y) where a = [ x ^ y' | y' <- [0..] ]
g' x = (\y -> a ! y) where a = [ x ^ y' | y' <- [0..] ]

Then in:

let k = g 2 in k 100 + k 100

We might compute 2^100 more than once, but in:

let k = g' 2 in k 100 + k 100

We will only compute it once.

If you are doing work with memoization, I recommend data-memocombinators on Hackage, which is a library of memo tables of different shapes, so you don't have to roll your own.

回复收藏 0 原文

世界和平 2024-07-16 09:06:33

查找正在发生的情况的最佳方法是告诉编译器使用 -v4 输出其中间表示。输出量很大，有点难以阅读，但应该可以让您准确地找出生成的代码的差异是什么，以及编译器如何到达那里。

您可能会注意到，在第一个示例中，fA 被移动到函数之外（到“全局范围”）。在您的第二个示例中，它可能不是（这意味着它将在每次调用时重新创建）。

它没有被移到函数之外的一个可能原因是编译器认为它取决于 n 的值。在您的工作示例中，fA 没有可以依赖的 n 。

但我认为编译器避免在第二个示例中将 fA 移到外部的原因是因为它试图避免空间泄漏。考虑一下如果 fA 而不是您的数组，是一个无限列表（您在其中使用了 !! 运算符），会发生什么。想象一下，您用一个大数字（例如 f 10000）调用它一次，后来只用小数字（f 2、f 3）调用它>，f 12...）。先前调用的 10000 个元素仍在内存中，浪费了空间。因此，为了避免这种情况，编译器会在每次调用函数时再次创建 fA。

避免空间泄漏可能不会在您的第一个示例中发生，因为在这种情况下 f 实际上只被调用一次，返回一个闭包（我们现在处于纯函数和命令世界的前沿，所以事情变得更加微妙）。这个闭包取代了原来的函数，它永远不会被再次调用，因此 fA 只被调用一次（因此优化器可以随意将其移到函数之外）。在第二个示例中， f 不会被闭包替换（因为它的值取决于参数），因此将再次被调用。

如果您想尝试了解更多内容（这将有助于阅读 -v4 输出），您可以查看 Spineless Tagless G-Machine 论文 (citeseer 链接）。

至于你的最后一个问题，我认为这是编译器的特性（但我可能是错的）。然而，如果所有编译器都做同样的事情，即使它不是语言的一部分，我也不会感到惊讶。

The best way to find what is going on is to tell the compiler to output its intermediate representation with -v4. The output is voluminous and a bit hard to read, but should allow you to find out exactly what the difference in the generated code is, and how the compiler arrived there.

You will probably notice that fA is being moved outside the function (to the "global scope") on your first example. On your second example, it probably is not (meaning it will be recreated on each call).

One possible reason for it not being moved outside the function would be because the compiler is thinking it depends on the value of n. On your working example, there is no n for fA to depend on.

But the reason I think the compiler is avoiding moving fA outside on your second example is because it is trying to avoid a space leak. Consider what would happen if fA, instead of your array, were an infinite list (on which you used the !! operator). Imagine you called it once with a large number (for instance f 10000), and later only called it with small numbers (f 2, f 3, f 12...). The 10000 elements from the earlier call are still on memory, wasting space. So, to avoid this, the compiler creates fA again every time you call your function.

The space leak avoidance probably does not happen on your first example because in that case f is in fact only called once, returning a closure (we are now at the frontier of the pure functional and the imperative worlds, so things get a bit more subtle). This closure replaces the original function, which will never be called again, so fA is only called once (and thus the optimizer feels free to move it outside the function). On your second example, f does not get replaced by a closure (since its value depends on the argument), and thus will get called again.

If you want to try to understand more of this (which will help reading the -v4 output), you could take a look at the Spineless Tagless G-Machine paper (citeseer link).

As to your final question, I think it is a compiler peculiarity (but I could be wrong). However, it would not surprise me if all compilers did the same thing, even if it were not part of the language.

回复收藏 0 原文