Data.Sequence 与 Data.DList 将数据附加到列表末尾

发布于 2024-11-15 16:42:51 字数 249 浏览 1 评论 0原文

我正在编写一些需要经常附加到列表末尾的代码。我知道使用“++”效率很低。因此，我通过附加到头部来向后构建列表，然后在完成后反转它。我认为这是一种常见的初学者策略。

我宁愿一开始就按照正确的顺序构建它 - 但这意味着切换到新的数据结构。我正在考虑将 Data.Sequence 或 Data.DList 用于我的容器。我的列表由严格的 int 对组成，我不需要随机访问它。 Data.Sequence 和 Data.DList 的相对优点是什么？我还应该考虑其他容器吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

哆兒滾 2024-11-22 16:42:51

使用 Data.Sequence 还是 DList 取决于您将如何使用结果列表。当您构建序列（例如在 Writer 计算中）以在最后转换为列表并使用它时，DList 非常有用。但是，如果您需要使用中间结果，例如：

f (foo ++ bar)
+ f (foo ++ bar ++ baz)
+ f (foo ++ bar ++ baz ++ quux)

那么 DList 就非常糟糕，因为它每次都需要重新计算脊柱。在这种情况下，Data.Sequence 是更好的选择。如果您需要从序列中删除元素，Data.Sequence 也更好。

但也许您甚至不需要做出这个决定。在计算结束时反转列表在 ML 和 Scheme 等严格的函数式语言中很常见，但在 Haskell 中却不常见。以map的两种编写方式为例：

map_i f xs = reverse $ go [] xs
    where
    go accum [] = accum
    go accum (x:xs) = go (f x : accum) xs

map_ii f [] = []
map_ii f (x:xs) = f x : map_ii f xs

在严格的语言中，map_ii会很糟糕，因为它使用线性堆栈空间，而map_i > 是尾递归。但由于 Haskell 很懒，所以 map_i 效率较低。 map_ii 可以消耗输入的一个元素并产生输出的一个元素，而 map_i 在产生任何输出之前消耗整个输入。

尾递归并不是 Haskell 中高效实现的圣杯。当生成像列表这样的数据结构时，您实际上想要协同递归；也就是说，在构造函数的应用程序下进行递归调用（例如上面的 fx : map_ii f xs）。

因此，如果您发现自己在尾递归函数之后发生逆转，请看看是否可以将整个过程分解为核心递归函数。

Whether to use Data.Sequence or DList depends on how you are going to be using the resulting list. DList is great when you are building up a sequence, say in a Writer computation, to convert to a list at the end and use it. However, if you need to use the intermediate results, like, say:

f (foo ++ bar)
+ f (foo ++ bar ++ baz)
+ f (foo ++ bar ++ baz ++ quux)

then DList is pretty bad, because it needs to recompute the spine each time. Data.Sequence is a better choice in this situation. Data.Sequence is also better if you need to remove elements from the sequence.

But maybe you don't even need to make this decision. Reversing lists at the end of a computation is common in strict functional languages like ML and Scheme, but not in Haskell. Take, for example, these two ways of writing map:

map_i f xs = reverse $ go [] xs
    where
    go accum [] = accum
    go accum (x:xs) = go (f x : accum) xs

map_ii f [] = []
map_ii f (x:xs) = f x : map_ii f xs

In a strict language, map_ii would be horrible because it uses linear stack space, whereas map_i is tail recursive. But because Haskell is lazy, map_i is the inefficient one. map_ii can consume one element of the input and yield one element of the output, whereas map_i consumes the whole input before yielding any output.

Tail recursion isn't the holy grail of efficient implementation in Haskell. When producing a data structure like a list, you actually want to be co-recursive; that is, make the recursive call underneath an application of a constructor (eg. f x : map_ii f xs above).

So if you find yourself reversing after a tail-recursive function, see if you can factor the whole lot into a corecursive function.

回复收藏 0 原文

盛夏已如深秋| 2024-11-22 16:42:51

我做了一个简单的标准比较：

let mdata = replicate 1000 (replicate 100 (13 :: Int))
let f1 = foldl' (++) []
let fr = foldr (++) []
let f2 = foldl' (\s next -> s Seq.|> next) mempty
let f3 = foldl' (DL.snoc) mempty

defaultMain [
    bgroup "time encode" [ bench "++ - foldl"   $ nf (AE.encode . f1) mdata
                         , bench "++ - foldr"   $ nf (AE.encode . fr) mdata
                         , bench "Seq |>"  $ nf (AE.encode . toList . f2) mdata
                         ,  bench "DList snoc"  $ nf (AE.encode . DL.toList . f3) mdata]
  ]

结果：

benchmarking time encode/++ - foldl
time                 604.1 ms   (570.0 ms .. 654.6 ms)
                     0.999 R²   (NaN R² .. 1.000 R²)
mean                 619.2 ms   (608.9 ms .. 625.6 ms)
std dev              9.761 ms   (0.0 s .. 11.21 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking time encode/++ - foldr
time                 2.554 ms   (2.503 ms .. 2.610 ms)
                     0.995 R²   (0.990 R² .. 0.998 R²)
mean                 2.584 ms   (2.547 ms .. 2.628 ms)
std dev              134.1 μs   (111.7 μs .. 186.6 μs)
variance introduced by outliers: 34% (moderately inflated)

benchmarking time encode/Seq |>
time                 2.020 ms   (1.986 ms .. 2.049 ms)
                     0.997 R²   (0.995 R² .. 0.998 R²)
mean                 2.106 ms   (2.078 ms .. 2.138 ms)
std dev              105.8 μs   (85.60 μs .. 130.5 μs)
variance introduced by outliers: 34% (moderately inflated)

benchmarking time encode/DList snoc
time                 1.992 ms   (1.952 ms .. 2.034 ms)
                     0.996 R²   (0.994 R² .. 0.998 R²)
mean                 2.030 ms   (2.000 ms .. 2.060 ms)
std dev              97.88 μs   (82.60 μs .. 122.3 μs)
variance introduced by outliers: 34% (moderately inflated)

结论：使用Data.Sequence。它具有最大的灵活性，但性能仅比 DList 低一点 - 它可能不值得。

I did a simple criterion comparison:

let mdata = replicate 1000 (replicate 100 (13 :: Int))
let f1 = foldl' (++) []
let fr = foldr (++) []
let f2 = foldl' (\s next -> s Seq.|> next) mempty
let f3 = foldl' (DL.snoc) mempty

defaultMain [
    bgroup "time encode" [ bench "++ - foldl"   $ nf (AE.encode . f1) mdata
                         , bench "++ - foldr"   $ nf (AE.encode . fr) mdata
                         , bench "Seq |>"  $ nf (AE.encode . toList . f2) mdata
                         ,  bench "DList snoc"  $ nf (AE.encode . DL.toList . f3) mdata]
  ]

With the result:

benchmarking time encode/++ - foldl
time                 604.1 ms   (570.0 ms .. 654.6 ms)
                     0.999 R²   (NaN R² .. 1.000 R²)
mean                 619.2 ms   (608.9 ms .. 625.6 ms)
std dev              9.761 ms   (0.0 s .. 11.21 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking time encode/++ - foldr
time                 2.554 ms   (2.503 ms .. 2.610 ms)
                     0.995 R²   (0.990 R² .. 0.998 R²)
mean                 2.584 ms   (2.547 ms .. 2.628 ms)
std dev              134.1 μs   (111.7 μs .. 186.6 μs)
variance introduced by outliers: 34% (moderately inflated)

benchmarking time encode/Seq |>
time                 2.020 ms   (1.986 ms .. 2.049 ms)
                     0.997 R²   (0.995 R² .. 0.998 R²)
mean                 2.106 ms   (2.078 ms .. 2.138 ms)
std dev              105.8 μs   (85.60 μs .. 130.5 μs)
variance introduced by outliers: 34% (moderately inflated)

benchmarking time encode/DList snoc
time                 1.992 ms   (1.952 ms .. 2.034 ms)
                     0.996 R²   (0.994 R² .. 0.998 R²)
mean                 2.030 ms   (2.000 ms .. 2.060 ms)
std dev              97.88 μs   (82.60 μs .. 122.3 μs)
variance introduced by outliers: 34% (moderately inflated)

Conclusion: Use Data.Sequence. It has the most flexibility while just a notch lower performance than DList - it's probably not worth it.

回复收藏 0 原文

~没有更多了~