在急切求值中重用不可变状态的内存？

发布于 2024-10-19 16:53:26 字数 462 浏览 3 评论 0原文

我正在研究纯函数式语言，目前正在考虑一些不可变数据的实现。

这是一个伪代码。

List a = [1 .. 10000]
List b = NewListWithoutLastElement a
b

在评估 b 时，必须在不可变数据的 eager/strict 实现中复制 b。但在这种情况下，a 不再在任何地方使用，因此可以安全地重新使用“a”的内存以避免复制成本。

此外，程序员可以通过使用某些关键字标记类型List来强制编译器始终执行此操作，这些关键字表示must-be-dispose-after-using。这使得逻辑上的编译时错误无法避免复制成本。

这可以获得巨大的性能。因为它也可以应用于巨大的对象图。

你觉得怎么样？有任何实施吗？

原文

I'm studying purely functional language and currently thinking about some immutable data implementation.

Here is a pseudo code.

List a = [1 .. 10000]
List b = NewListWithoutLastElement a
b

When evaluating b, b must be copied in eager/strict implementation of immutable data.
But in this case, a is not used anymore in any place, so memory of 'a' can be re-used safely to avoid copying cost.

Furthermore, programmer can force compiler always do this by marking the type List with some keyword meaning must-be-disposed-after-using. Which makes compile time error on logic cannot avoid copying cost.

This can gain huge performance. Because it can be applied to huge object graph too.

How do you think? Any implementations?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

橘香 2024-10-26 16:53:26

这是可能的，但范围受到严格限制。请记住，函数式程序中的绝大多数复杂值将传递给许多函数以从中提取各种属性 - 并且大多数时候，这些函数本身就是其他函数的参数，这意味着您不能做出任何假设关于他们。

例如：

let map2 f g x = f x, g x 

let apply f = 
  let a = [1 .. 10000]
  f a 

// in another file :

apply (map2 NewListWithoutLastElement NewListWithoutFirstElement)

这在功能代码中相当标准，并且无法在 a 上放置 must-be-dispose-after-using 属性，因为没有特定位置对程序的其余部分有足够的了解。当然，您可以尝试将该信息添加到类型系统中，但是对此的类型推断绝对不是微不足道的（更不用说类型会变得相当大）。

当您拥有可能在值之间共享子元素的复合对象（例如树）时，情况会变得更糟。考虑一下：

let a = binary_tree [ 1; 2; 5; 7; 9 ]
let result_1 = complex_computation_1 (insert a 6)
let result_2 = complex_computation_2 (remove a 5)

为了允许在 complex_computation_2 内重用内存，您需要证明 complex_computation_1 不会更改 a，也不会存储任何部分result_1 中的 a，并在 complex_computation_2 开始工作时使用 a 完成。虽然前两个要求可能看起来最难，但请记住，这是一种纯函数式语言：第三个要求实际上会导致性能大幅下降，因为 complex_computation_1 和 complex_computation_2 不能不再在不同的线程上运行！

实际上，这在绝大多数函数式语言中都不是问题，原因如下：

它们有专门为此构建的垃圾收集器。对他们来说，分配新内存并回收废弃的内存比尝试重用现有内存更快。在绝大多数情况下，这已经足够快了。
他们拥有已经实现数据共享的数据结构。例如，NewListWithoutFirstElement 已经无需任何努力就可以完全重用转换后的列表的内存。对于函数式程序员（实际上是任何类型的程序员）来说，根据性能考虑来确定数据结构的使用是相当常见的，并且将“删除最后一个”算法重写为“删除第一个”算法很容易。
惰性求值已经做了等价的事情：惰性列表的尾部最初只是一个闭包，如果需要，可以对尾部进行求值，因此没有内存可以重用。另一方面，这意味着在示例中从 b 读取一个元素将从 a 读取一个元素，确定它是否是最后一个，然后返回它，而不需要真正存储（cons 单元可能会分配在其中的某个位置，但这种情况在函数式编程语言中经常发生，并且短命的小对象对于 GC 来说完全没问题）。

This would be possible, but severely limited in scope. Keep in mind that the vast majority of complex values in a functional program will be passed to many functions to extract various properties from them - and, most of the time, those functions are themselves arguments to other functions, which means you cannot make any assumptions about them.

For example:

let map2 f g x = f x, g x 

let apply f = 
  let a = [1 .. 10000]
  f a 

// in another file :

apply (map2 NewListWithoutLastElement NewListWithoutFirstElement)

This is fairly standard in functional code, and there is no way to place a must-be-disposed-after-using attribute on a because no specific location has enough knowledge about the rest of the program. Of course, you could try adding that information to the type system, but type inference on this is decidedly non-trivial (not to mention that types would grow quite large).

Things get even worse when you have compound objects, such as trees, that might share sub-elements between values. Consider this:

let a = binary_tree [ 1; 2; 5; 7; 9 ]
let result_1 = complex_computation_1 (insert a 6)
let result_2 = complex_computation_2 (remove a 5)

In order to allow memory reuse within complex_computation_2, you would need to prove that complex_computation_1 does not alter a, does not store any part of a within result_1 and is done using a by the time complex_computation_2 starts working. While the two first requirements might seem the hardest, keep in mind that this is a pure functional language: the third requirement actually causes a massive performance drop because complex_computation_1 and complex_computation_2 cannot be run on different threads anymore!

In practice, this is not an issue in the vast majority of functional languages, for three reasons:

They have a garbage collector built specifically for this. It is faster for them to just allocate new memory and reclaim the abandoned one, rather than try to reuse existing memory. In the vast majority of cases, this will be fast enough.
They have data structures that already implement data sharing. For instance, NewListWithoutFirstElement already provides full reuse of the memory of the transformed list without any effort. It's fairly common for functional programmers (and any kind of programmers, really) to determine their use of data structures based on performance considerations, and rewriting a "remove last" algorithm as a "remove first" algorithm is kind of easy.
Lazy evaluation already does something equivalent: a lazy list's tail is initially just a closure that can evaluate the tail if you need to—so there's no memory to be reused. On the other hand, this means that reading an element from b in your example would read one element from a, determine if it's the last, and return it without really requiring storage (a cons cell would probably be allocated somewhere in there, but this happens all the time in functional programming languages and short-lived small objects are perfectly fine with the GC).

回复收藏 0 原文

~没有更多了~