惯用的高效 Haskell 附加?
列表和 cons 运算符 (:)
在 Haskell 中非常常见。缺点是我们的朋友。但有时我想添加到列表的末尾。
xs `append` x = xs ++ [x]
遗憾的是,这并不是一种有效的实现方式。
我在 Haskell 中写了 帕斯卡三角形,但我必须使用 ++ [x]
反习语:
ptri = [1] : mkptri ptri
mkptri (row:rows) = newRow : mkptri rows
where newRow = zipWith (+) row (0:row) ++ [1]
恕我直言,这是一个可爱可读的帕斯卡三角形等等,但反习语让我感到厌烦。有人可以向我解释一下(最好是给我指出一个很好的教程)对于您想要有效地附加到末尾的情况,惯用的数据结构是什么?我希望这个数据结构及其方法具有近乎列表般的美感。或者,或者,向我解释为什么这个反习语对于这种情况实际上并没有那么糟糕(如果你相信情况是这样的话)。
[编辑] 我最喜欢的答案是 Data.Sequence,它确实具有“接近列表的美”。不知道我对操作所需的严格性有何看法。随时欢迎进一步的建议和不同的想法。
import Data.Sequence ((|>), (<|), zipWith, singleton)
import Prelude hiding (zipWith)
ptri = singleton 1 : mkptri ptri
mkptri (seq:seqs) = newRow : mkptri seqs
where newRow = zipWith (+) seq (0 <| seq) |> 1
现在我们只需要 List 是一个类,以便其他结构可以使用它的方法,如 zipWith ,而不需要从 Prelude 中隐藏它,或者限定它。 :P
List and the cons operator (:)
are very common in Haskell. Cons is our friend. But sometimes I want to add to the end of a list instead.
xs `append` x = xs ++ [x]
This, sadly, is not an efficient way to implement it.
I wrote up Pascal's triangle in Haskell, but I had to use the ++ [x]
anti-idiom:
ptri = [1] : mkptri ptri
mkptri (row:rows) = newRow : mkptri rows
where newRow = zipWith (+) row (0:row) ++ [1]
imho, this is a lovely readable Pascal's triangle and all, but the anti-idiom irks me. Can someone explain to me (and, ideally, point me to a good tutorial) on what the idiomatic data structure is for cases where you want to append to the end efficiently? I'm hoping for near-list-like beauty in this data structure and its methods. Or, alternately, explain to me why this anti-idiom is actually not that bad for this case (if you believe such to be the case).
[edit] The answer I like the best is Data.Sequence
, which does indeed have "near-list-like beauty." Not sure how I feel about the required strictness of operations. Further suggestions and different ideas are always welcome.
import Data.Sequence ((|>), (<|), zipWith, singleton)
import Prelude hiding (zipWith)
ptri = singleton 1 : mkptri ptri
mkptri (seq:seqs) = newRow : mkptri seqs
where newRow = zipWith (+) seq (0 <| seq) |> 1
Now we just need List to be a class, so that other structures can use its methods like zipWith
without hiding it from Prelude, or qualifying it. :P
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
请记住,看起来很糟糕的渐近实际上可能并非如此,因为您使用的是惰性语言。在严格的语言中,以这种方式附加到链表的末尾总是 O(n)。在惰性语言中,只有当你实际遍历到列表的末尾时,它才是 O(n),在这种情况下,无论如何你都会花费 O(n) 的努力。所以很多时候,懒惰可以拯救你。
这不是保证......例如,k 追加后跟遍历仍将在 O(nk) 中运行,而它本来可以是 O(n+k) 。但它确实在某种程度上改变了情况。当立即强制结果时,根据渐近复杂性来考虑单个操作的性能,最终并不总能给出正确的答案。
Keep in mind that what looks poor asymptotics might actually not be, because you are working in a lazy language. In a strict language, appending to the end of a linked list in this way would always be O(n). In a lazy language, it's O(n) only if you actually traverse to the end of the list,in which case you would have spent O(n) effort anyway. So in many cases, laziness saves you.
This isn't a guarantee... for example, k appends followed by a traversal will still run in O(nk) where it could have been O(n+k). But it does change the picture somewhat. Thinking about performance of single operations in terms of their asymptotic complexity when the result is immediately forced doesn't always give you the right answer in the end.
标准
序列
具有O(1)用于从“两端”添加和O(log(min(n1,n2)))用于一般连接:http://hackage.haskell.org/packages/archive/containers/latest/doc/html/Data -Sequence.html
与列表的区别在于
Sequence
是严格的Standard
Sequence
has O(1) for addition from 'both ends' and O(log(min(n1,n2))) for general concatenation:http://hackage.haskell.org/packages/archive/containers/latest/doc/html/Data-Sequence.html
The difference from lists though is that
Sequence
is strict像这样的显式递归可以避免您附加“反惯用语”。虽然,我认为这不像你的例子那么清楚。
Something like this explicit recursion avoids your append "anti-idiom". Although, I don't think it is as clear as your example.
在您的帕斯卡三角形代码中, ++ [x] 实际上不是问题。由于无论如何你都必须在 ++ 的左侧生成一个新列表,因此你的算法本质上是二次的;仅通过避免 ++ 不能使其渐近更快。
另外,在这种特殊情况下,当您编译 -O2 时,GHC 的列表融合规则(应该)消除 ++ 通常创建的列表的副本。这是因为在第一个参数中 zipWith 是一个好的生产者,而 ++ 是一个好的消费者。您可以在 GHC 用户的指南。
In your code for Pascal's Triangle, ++ [x] is not actually a problem. Since you have to produce a new list on the left hand side of ++ anyway, your algorithm is inherently quadratic; you cannot make it asymptotically faster merely by avoiding ++.
Also, in this particular case, when you compile -O2, GHC's list fusion rules (should) eliminate the copy of the list that ++ would normally create. This is because zipWith is a good producer and ++ is a good consumer in it's first argument. You can read about these optimizations in GHC User's Guide.
根据您的用例,
ShowS
方法(通过函数组合附加)可能有用。Depending on your use case, the
ShowS
method (appending via function composition) might be useful.如果您只想要便宜的附加(concat)和 snoc(右侧的 cons),Hughes 列表(在 Hackage 上也称为 DList)是最容易实现的。如果你想知道它们是如何工作的,可以看看 Andy Gill 和 Graham Hutton 的第一篇 Worker Wrapper 论文,John Hughes 的原始论文好像不在网上。正如其他人上面所说的,ShowS 是一个 String 专门的休斯列表/DList。
JoinList 的实现需要更多工作。这是一个二叉树,但有一个列表 API - concat 和 snoc 很便宜,你可以合理地 fmap 它:Hackage 上的 DList 有一个函子实例,但我认为它不应该有 - 函子实例必须变形进出常规列表。如果你想要一个 JoinList 那么你需要自己推出 - Hackage 上的那个是我的,它效率不高,写得也不好。
Data.Sequence 具有高效的缺点和 snoc,并且适用于其他操作 - 获取、删除等 JoinList 速度较慢的操作。由于 Data.Sequence 的内部指树实现必须平衡树,因此追加工作比其等效的 JoinList 更多。在实践中,因为 Data.Sequence 写得更好,所以我希望它的附加性能仍然优于我的 JoinList。
If you just want cheap append (concat) and snoc (cons at the right) a Hughes list, also called DList on Hackage, is the simplest to implement. If you want to know how they work, look at Andy Gill and Graham Hutton's first Worker Wrapper paper, John Hughes's original paper doesn't seem to be online. As others have said above ShowS is a String specialized Hughes list / DList.
A JoinList is a bit more work to implement. This is a binary tree but with a list API - concat and snoc are cheap and you can reasonably fmap it: the DList on Hackage has a functor instance but I contend it shouldn't have - the functor instance has to metamorph in and out of a regular list. If you want a JoinList then you'll need to roll your own - the one on Hackage is mine and it's not efficient, nor well written.
Data.Sequence has efficient cons and snoc, and is good for other operations - takes, drops etc. that a JoinList is slow for. Because the internal finger tree implementation of Data.Sequence has to balance the tree, append is more work than its JoinList equivalent. In practice because Data.Sequence is better written, I'd expect it still out-performs my JoinList for append.
另一种方法是通过使用无限列表来完全避免串联:
another way would to avoid concatenation at all by just using infinite lists:
我不一定将您的代码称为“反 idomatic”。通常,越清晰越好,即使这意味着牺牲一些时钟周期。
在您的特定情况下,最后的附加实际上并没有改变大O时间复杂度!评估表达式
将花费与长度 xs 成比例的时间,并且任何花哨的序列数据结构都不会改变这一点。如果有的话,只有常数因子会受到影响。
I wouldn't necessarily call your code "anti-idomatic". Oftentimes, clearer is better, even if that means to sacrifice a few clock cycles.
And in your particular case, the append at the end doesn't actually change the big-O time complexity! Evaluating the expression
will take time proportional
length xs
and no fancy sequence data structure is going to change that. If anything, only the constant factor will be affected.Chris Okasaki 设计了一个队列来解决这个问题。参见他的论文第15页
http://www.cs.cmu.edu/~rwh/theses/okasaki .pdf
您可能需要稍微调整代码,但是使用反向和保留列表的两部分可以让您平均更高效地工作。
另外,有人在 monad reader 中放置了一些具有高效操作的列表代码。我承认,我并没有真正理解它,但我想如果我集中注意力我就能弄清楚。原来是 Monad Reader 第 17 期中的 Douglas M. Auclair
http://themonadreader.files.wordpress.com/2011/01/issue17.pdf
我意识到上面的答案并没有直接解决问题。所以,为了咯咯笑,这是我的递归答案。随意撕开它——它并不漂亮。
Chris Okasaki has a design for a queue that addresses this issue. See page 15 of his thesis
http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf
You may need to adapt code slightly, but some use of reverse and keeping two pieces of the list lets you work more efficiently on average.
Also, someone put up some list code in the monad reader with efficient operations. I admit, I didn't really follow it, but I thought I could figure it out if I concentrated. It turns out it was Douglas M. Auclair in Monad Reader issue 17
http://themonadreader.files.wordpress.com/2011/01/issue17.pdf
I realized the above answer does not directly address the question. So, for giggles, here is my recursive answer. Feel free to tear it apart -- it is not pretty.
我编写了 @geekosaur 的
ShowS
方法的示例。您可以在 prelude 中看到许多ShowS
的示例。[编辑]按照@Dan的想法,我用zipWithS重写了newRow。
I wrote an example of @geekosaur's
ShowS
approach. You can see many examples ofShowS
in the prelude.[edit] As @Dan's idea, I rewrote newRow with zipWithS.
如果您正在寻找通用解决方案,那么这样怎么样:
这为映射提供了一个简单的替代定义:
我们可以为其他基于foldr的函数(例如zipWith)提供类似的定义:
再次相当容易地导出zipWith和zip:
现在如果我们使用这些通用功能,您的实现
变得很容易:
If you're looking for a general purpose solution, then how about this:
This gives a simple alternate definition for map:
We can a similar definition for other foldr-based functions, like zipWith:
Again deriving zipWith and zip fairly easily:
Now if we use these general purpose functions, your implementation
becomes pretty easy:
您可以将列表表示为从 [] 构建列表的函数,
然后您可以轻松附加列表并添加到任一端。
您可以重写 zipWith 以返回这些部分列表:
现在您可以将 ptri 写为:
更进一步,这里有一个更对称的单行:
或者这更简单:
或者没有 zipWith' (mapAccumR 在 Data.List 中):
You can represent a list as a function to build a list from []
Then you can easily append lists and add to either end.
You can rewrite zipWith to return these partial lists:
And now you can write ptri as:
Taking it further, here's a one-liner that's more symmetrical:
Or this is simpler yet:
Or without zipWith' (mapAccumR is in Data.List):