Haskell (GHC) 中的列表是如何实现的?
我只是对 Haskell 中列表的一些具体实现细节感到好奇(GHC 特定的答案很好)——它们是朴素的链表,还是有任何特殊的优化?更具体地说:
length
和(!!)
(例如)是否必须迭代列表?- 如果是这样,它们的值是否以任何方式缓存(即,如果我调用
length
两次,是否必须迭代两次)? - 访问列表后面是否涉及迭代整个列表?
- 无限列表和列表推导式是否被记忆? (即,对于 fib = 1:1:zipWith (+) fib (tail fib),每个值是否会递归计算,或者是否依赖于先前的计算值?)
任何其他有趣的实现细节将不胜感激。提前致谢!
I was just curious about some exact implementation details of lists in Haskell (GHC-specific answers are fine)--are they naive linked lists, or do they have any special optimizations? More specifically:
- Do
length
and(!!)
(for instance) have to iterate through the list? - If so, are their values cached in any way (i.e., if I call
length
twice, will it have to iterate both times)? - Does access to the back of the list involve iterating through the whole list?
- Are infinite lists and list comprehensions memoized? (i.e., for
fib = 1:1:zipWith (+) fib (tail fib)
, will each value be computed recursively, or will it rely on the previous computed value?)
Any other interesting implementation details would be much appreciated. Thanks in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Haskell 中的列表没有特殊的操作处理。它们的定义如下:
仅使用一些特殊符号:
[a]
表示List a
,[]
表示Nil
> 和(:)
表示缺点
。如果您定义了相同的操作并重新定义了所有操作,您将获得完全相同的性能。因此,Haskell 列表是单链接的。由于惰性,它们经常被用作迭代器。
sum [1..n]
在常量空间中运行,因为此列表中未使用的前缀会随着求和的进行而被垃圾收集,并且在需要时才会生成尾部。至于 #4:Haskell 中的所有值都会被记忆,但函数不为其参数保留记忆表。因此,当您像您一样定义
fib
时,结果将被缓存,并且第 n 个斐波那契数将在 O(n) 时间内被访问。但是,如果您以这种明显等效的方式定义它:(花点时间注意与您的定义的相似性)
那么结果不会共享,并且第 n 个斐波那契数将在 O(fib n) (指数)时间内访问。您可以说服函数与记忆库共享,例如 data-memocombinators。
Lists have no special operational treatment in Haskell. They are defined just like:
Just with some special notation:
[a]
forList a
,[]
forNil
and(:)
forCons
. If you defined the same and redefined all the operations, you would get the exact same performance.Thus, Haskell lists are singly-linked. Because of laziness, they are often used as iterators.
sum [1..n]
runs in constant space, because the unused prefixes of this list are garbage collected as the sum progresses, and the tails aren't generated until they are needed.As for #4: all values in Haskell are memoized, with the exception that functions do not keep a memo table for their arguments. So when you define
fib
like you did, the results will be cached and the nth fibonacci number will be accessed in O(n) time. However, if you defined it in this apparently equivalent way:(Take a moment to note the similarity to your definition)
Then the results are not shared and the nth fibonacci number will be accessed in O(fib n) (which is exponential) time. You can convince functions to be shared with a memoization library like data-memocombinators.
据我所知(我不知道其中有多少是 GHC 特定的)
length
和(!!)
必须遍历列表.我认为列表没有任何特殊的优化,但有一种适用于所有数据类型的技术。
如果你有类似的东西
那么
长度xs
将被计算两次。但是如果你有
那么它只会计算一次。
是的。
是的,一旦命名值的一部分被计算出来,它就会被保留,直到名称超出范围。
(该语言不需要这样做,但这就是我理解实现行为的方式。) 是
As far as I know (I don't know how much of this is GHC-specific)
length
and(!!)
DO have to iterate through the list.I don't think there are any special optimisations for lists, but there is a technique that applies to all datatypes.
If you have something like
then
length xs
will be computed twice.But if instead you have
then it will only be computed once.
Yes.
Yes, once part of a named value is computed, it is retained until the name goes out of scope.
(The language doesn't require this, but this is how I understand the implementations behave.)
GHC 不会执行完整的公共子表达式消除。例如:
在
-ddump-simpl
上给出:请注意,
aaaaaaaaa
调用GHC.List.$wlen
两次。(事实上,由于
x
需要保留在aaaaaaaaa
中,因此它比bbbbbbbbbb
慢 2 倍以上。)GHC does not perform full Common Subexpression Elimination. For example:
Gives on
-ddump-simpl
:Note that
aaaaaaaaa
callsGHC.List.$wlen
twice.(In fact, because
x
needs to be retained inaaaaaaaaa
, it is more than 2x slower thanbbbbbbbbb
.)