Haskell (GHC) 中的列表是如何实现的?

发布于 2024-08-30 04:34:24 字数 378 浏览 2 评论 0原文

我只是对 Haskell 中列表的一些具体实现细节感到好奇(GHC 特定的答案很好)——它们是朴素的链表,还是有任何特殊的优化?更具体地说:

  1. length(!!) (例如)是否必须迭代列表?
  2. 如果是这样,它们的值是否以任何方式缓存(即,如果我调用 length 两次,是否必须迭代两次)?
  3. 访问列表后面是否涉及迭代整个列表?
  4. 无限列表和列表推导式是否被记忆? (即,对于 fib = 1:1:zipWith (+) fib (tail fib),每个值是否会递归计算,或者是否依赖于先前的计算值?)

任何其他有趣的实现细节将不胜感激。提前致谢!

I was just curious about some exact implementation details of lists in Haskell (GHC-specific answers are fine)--are they naive linked lists, or do they have any special optimizations? More specifically:

  1. Do length and (!!) (for instance) have to iterate through the list?
  2. If so, are their values cached in any way (i.e., if I call length twice, will it have to iterate both times)?
  3. Does access to the back of the list involve iterating through the whole list?
  4. Are infinite lists and list comprehensions memoized? (i.e., for fib = 1:1:zipWith (+) fib (tail fib), will each value be computed recursively, or will it rely on the previous computed value?)

Any other interesting implementation details would be much appreciated. Thanks in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

踏雪无痕 2024-09-06 04:34:24

Haskell 中的列表没有特殊的操作处理。它们的定义如下:

data List a = Nil | Cons a (List a)

仅使用一些特殊符号:[a] 表示 List a[] 表示 Nil > 和 (:) 表示缺点。如果您定义了相同的操作并重新定义了所有操作,您将获得完全相同的性能。

因此,Haskell 列表是单链接的。由于惰性,它们经常被用作迭代器。 sum [1..n] 在常量空间中运行,因为此列表中未使用的前缀会随着求和的进行而被垃圾收集,并且在需要时才会生成尾部。

至于 #4:Haskell 中的所有值都会被记忆,但函数不为其参数保留记忆表。因此,当您像您一样定义 fib 时,结果将被缓存,并且第 n 个斐波那契数将在 O(n) 时间内被访问。但是,如果您以这种明显等效的方式定义它:(

-- Simulate infinite lists as functions from Integer
type List a = Int -> a

cons :: a -> List a -> List a
cons x xs n | n == 0    = x
            | otherwise = xs (n-1)

tailF :: List a -> List a
tailF xs n = xs (n+1)

fib :: List Integer
fib = 1 `cons` (1 `cons` (\n -> fib n + tailF fib n))

花点时间注意与您的定义的相似性)

那么结果不会共享,并且第 n 个斐波那契数将在 O(fib n) (指数)时间内访问。您可以说服函数与记忆库共享,例如 data-memocombinators

Lists have no special operational treatment in Haskell. They are defined just like:

data List a = Nil | Cons a (List a)

Just with some special notation: [a] for List a, [] for Nil and (:) for Cons. If you defined the same and redefined all the operations, you would get the exact same performance.

Thus, Haskell lists are singly-linked. Because of laziness, they are often used as iterators. sum [1..n] runs in constant space, because the unused prefixes of this list are garbage collected as the sum progresses, and the tails aren't generated until they are needed.

As for #4: all values in Haskell are memoized, with the exception that functions do not keep a memo table for their arguments. So when you define fib like you did, the results will be cached and the nth fibonacci number will be accessed in O(n) time. However, if you defined it in this apparently equivalent way:

-- Simulate infinite lists as functions from Integer
type List a = Int -> a

cons :: a -> List a -> List a
cons x xs n | n == 0    = x
            | otherwise = xs (n-1)

tailF :: List a -> List a
tailF xs n = xs (n+1)

fib :: List Integer
fib = 1 `cons` (1 `cons` (\n -> fib n + tailF fib n))

(Take a moment to note the similarity to your definition)

Then the results are not shared and the nth fibonacci number will be accessed in O(fib n) (which is exponential) time. You can convince functions to be shared with a memoization library like data-memocombinators.

辞别 2024-09-06 04:34:24

据我所知(我不知道其中有多少是 GHC 特定的)

  1. length(!!) 必须遍历列表.

  2. 我认为列表没有任何特殊的优化,但有一种适用于所有数据类型的技术。

    如果你有类似的东西

    foo xs = bar(长度xs)++ baz(长度xs)
    

    那么长度xs将被计算两次。

    但是如果你有

    foo xs = bar len ++ baz len
      其中 len = 长度 xs
    

    那么它只会计算一次。

  3. 是的。

  4. 是的,一旦命名值的一部分被计算出来,它就会被保留,直到名称超出范围。
    (该语言不需要这样做,但这就是我理解实现行为的方式。) 是

As far as I know (I don't know how much of this is GHC-specific)

  1. length and (!!) DO have to iterate through the list.

  2. I don't think there are any special optimisations for lists, but there is a technique that applies to all datatypes.

    If you have something like

    foo xs = bar (length xs) ++ baz (length xs)
    

    then length xs will be computed twice.

    But if instead you have

    foo xs = bar len ++ baz len
      where len = length xs
    

    then it will only be computed once.

  3. Yes.

  4. Yes, once part of a named value is computed, it is retained until the name goes out of scope.
    (The language doesn't require this, but this is how I understand the implementations behave.)

我不在是我 2024-09-06 04:34:24

如果是这样,它们的值是否以任何方式缓存(即,如果我调用 length 两次,是否必须迭代两次)?

GHC 不会执行完整的公共子表达式消除。例如:

{-# NOINLINE aaaaaaaaa #-}
aaaaaaaaa :: [a] -> Int
aaaaaaaaa x = length x + length x

{-# NOINLINE bbbbbbbbb #-}
bbbbbbbbb :: [a] -> Int
bbbbbbbbb x = l + l where l = length x

main = bbbbbbbbb [1..2000000] `seq` aaaaaaaaa [1..2000000] `seq` return ()

-ddump-simpl 上给出:

Main.aaaaaaaaa [NEVER Nothing] :: forall a_adp.
                                  [a_adp] -> GHC.Types.Int
GblId
[Arity 1
 NoCafRefs
 Str: DmdType Sm]
Main.aaaaaaaaa =
  \ (@ a_ahc) (x_adq :: [a_ahc]) ->
    case GHC.List.$wlen @ a_ahc x_adq 0 of ww_anf { __DEFAULT ->
    case GHC.List.$wlen @ a_ahc x_adq 0 of ww1_Xnw { __DEFAULT ->
    GHC.Types.I# (GHC.Prim.+# ww_anf ww1_Xnw)
    }
    }

Main.bbbbbbbbb [NEVER Nothing] :: forall a_ado.
                                  [a_ado] -> GHC.Types.Int
GblId
[Arity 1
 NoCafRefs
 Str: DmdType Sm]
Main.bbbbbbbbb =
  \ (@ a_adE) (x_adr :: [a_adE]) ->
    case GHC.List.$wlen @ a_adE x_adr 0 of ww_anf { __DEFAULT ->
    GHC.Types.I# (GHC.Prim.+# ww_anf ww_anf)
    }

请注意,aaaaaaaaa 调用 GHC.List.$wlen 两次。

(事实上​​,由于 x 需要保留在 aaaaaaaaa 中,因此它比 bbbbbbbbbb 慢 2 倍以上。)

If so, are their values cached in any way (i.e., if I call length twice, will it have to iterate both times)?

GHC does not perform full Common Subexpression Elimination. For example:

{-# NOINLINE aaaaaaaaa #-}
aaaaaaaaa :: [a] -> Int
aaaaaaaaa x = length x + length x

{-# NOINLINE bbbbbbbbb #-}
bbbbbbbbb :: [a] -> Int
bbbbbbbbb x = l + l where l = length x

main = bbbbbbbbb [1..2000000] `seq` aaaaaaaaa [1..2000000] `seq` return ()

Gives on -ddump-simpl:

Main.aaaaaaaaa [NEVER Nothing] :: forall a_adp.
                                  [a_adp] -> GHC.Types.Int
GblId
[Arity 1
 NoCafRefs
 Str: DmdType Sm]
Main.aaaaaaaaa =
  \ (@ a_ahc) (x_adq :: [a_ahc]) ->
    case GHC.List.$wlen @ a_ahc x_adq 0 of ww_anf { __DEFAULT ->
    case GHC.List.$wlen @ a_ahc x_adq 0 of ww1_Xnw { __DEFAULT ->
    GHC.Types.I# (GHC.Prim.+# ww_anf ww1_Xnw)
    }
    }

Main.bbbbbbbbb [NEVER Nothing] :: forall a_ado.
                                  [a_ado] -> GHC.Types.Int
GblId
[Arity 1
 NoCafRefs
 Str: DmdType Sm]
Main.bbbbbbbbb =
  \ (@ a_adE) (x_adr :: [a_adE]) ->
    case GHC.List.$wlen @ a_adE x_adr 0 of ww_anf { __DEFAULT ->
    GHC.Types.I# (GHC.Prim.+# ww_anf ww_anf)
    }

Note that aaaaaaaaa calls GHC.List.$wlen twice.

(In fact, because x needs to be retained in aaaaaaaaa, it is more than 2x slower than bbbbbbbbb.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文