在急切求值中重用不可变状态的内存?
我正在研究纯函数式语言,目前正在考虑一些不可变数据的实现。
这是一个伪代码。
List a = [1 .. 10000]
List b = NewListWithoutLastElement a
b
在评估 b
时,必须在不可变数据的 eager/strict 实现中复制 b
。 但在这种情况下,a
不再在任何地方使用,因此可以安全地重新使用“a”的内存以避免复制成本。
此外,程序员可以通过使用某些关键字标记类型List
来强制编译器始终执行此操作,这些关键字表示must-be-dispose-after-using
。这使得逻辑上的编译时错误无法避免复制成本。
这可以获得巨大的性能。因为它也可以应用于巨大的对象图。
你觉得怎么样?有任何实施吗?
I'm studying purely functional language and currently thinking about some immutable data implementation.
Here is a pseudo code.
List a = [1 .. 10000]
List b = NewListWithoutLastElement a
b
When evaluating b
, b
must be copied in eager/strict implementation of immutable data.
But in this case, a
is not used anymore in any place, so memory of 'a' can be re-used safely to avoid copying cost.
Furthermore, programmer can force compiler always do this by marking the type List
with some keyword meaning must-be-disposed-after-using
. Which makes compile time error on logic cannot avoid copying cost.
This can gain huge performance. Because it can be applied to huge object graph too.
How do you think? Any implementations?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是可能的,但范围受到严格限制。请记住,函数式程序中的绝大多数复杂值将传递给许多函数以从中提取各种属性 - 并且大多数时候,这些函数本身就是其他函数的参数,这意味着您不能做出任何假设关于他们。
例如:
这在功能代码中相当标准,并且无法在
a
上放置must-be-dispose-after-using
属性,因为没有特定位置对程序的其余部分有足够的了解。当然,您可以尝试将该信息添加到类型系统中,但是对此的类型推断绝对不是微不足道的(更不用说类型会变得相当大)。当您拥有可能在值之间共享子元素的复合对象(例如树)时,情况会变得更糟。考虑一下:
为了允许在
complex_computation_2
内重用内存,您需要证明complex_computation_1
不会更改a
,也不会存储任何部分result_1
中的a
,并在complex_computation_2
开始工作时使用a
完成。虽然前两个要求可能看起来最难,但请记住,这是一种纯函数式语言:第三个要求实际上会导致性能大幅下降,因为complex_computation_1
和complex_computation_2
不能不再在不同的线程上运行!实际上,这在绝大多数函数式语言中都不是问题,原因如下:
b
读取一个元素将从a
读取一个元素,确定它是否是最后一个,然后返回它,而不需要真正存储(cons 单元可能会分配在其中的某个位置,但这种情况在函数式编程语言中经常发生,并且短命的小对象对于 GC 来说完全没问题)。This would be possible, but severely limited in scope. Keep in mind that the vast majority of complex values in a functional program will be passed to many functions to extract various properties from them - and, most of the time, those functions are themselves arguments to other functions, which means you cannot make any assumptions about them.
For example:
This is fairly standard in functional code, and there is no way to place a
must-be-disposed-after-using
attribute ona
because no specific location has enough knowledge about the rest of the program. Of course, you could try adding that information to the type system, but type inference on this is decidedly non-trivial (not to mention that types would grow quite large).Things get even worse when you have compound objects, such as trees, that might share sub-elements between values. Consider this:
In order to allow memory reuse within
complex_computation_2
, you would need to prove thatcomplex_computation_1
does not altera
, does not store any part ofa
withinresult_1
and is done usinga
by the timecomplex_computation_2
starts working. While the two first requirements might seem the hardest, keep in mind that this is a pure functional language: the third requirement actually causes a massive performance drop becausecomplex_computation_1
andcomplex_computation_2
cannot be run on different threads anymore!In practice, this is not an issue in the vast majority of functional languages, for three reasons:
NewListWithoutFirstElement
already provides full reuse of the memory of the transformed list without any effort. It's fairly common for functional programmers (and any kind of programmers, really) to determine their use of data structures based on performance considerations, and rewriting a "remove last" algorithm as a "remove first" algorithm is kind of easy.b
in your example would read one element froma
, determine if it's the last, and return it without really requiring storage (a cons cell would probably be allocated somewhere in there, but this happens all the time in functional programming languages and short-lived small objects are perfectly fine with the GC).