我一直在思考如何将双端队列(即双端队列)实现为不可变的数据结构。
似乎有不同的方法可以做到这一点。 AFAIK,不可变数据结构通常是分层的,因此在修改操作(例如插入或删除项目)后可以重用其主要部分。
埃里克·利珀特 (Eric Lippert) 有两个< /a> 他的博客上有关于此主题的文章,以及 C# 中的示例实现。
他的这两种实现都让我觉得比实际需要的更加复杂。双端队列不能简单地实现为二叉树,其中的元素只能在最“左”(前)和最“右”(后< /em>) 树的?
o
/ \
… …
/ \
… …
/ \ / \
front --> L … … R <-- back
此外,树将通过旋转保持合理的平衡:
- 在前面插入或从后面移除时向右旋转,
- 在从前面移除或从后面插入时向左旋转。
在我看来,埃里克·利珀特是一个非常聪明的人,我非常尊敬他,但他显然没有考虑过这种做法。因此我想知道,这是有充分理由的吗?我建议的实现双端队列的方法是否幼稚?
I've been thinking for a while about how to go about implementing a deque (that is, a double-ended queue) as an immutable data structure.
There seem to be different ways of doing this. AFAIK, immutable data structures are generally hierarchical, so that major parts of it can be reused after modifying operations such as the insertion or removal of an item.
Eric Lippert has two articles on his blog about this topic, along with sample implementations in C#.
Both of his implementations strike me as more elaborate than is actually necessary. Couldn't deques simply be implemented as binary trees, where elements can only be inserted or removed on the very "left" (the front) and on the very "right" (the back) of the tree?
o
/ \
… …
/ \
… …
/ \ / \
front --> L … … R <-- back
Additionally, the tree would be kept reasonably balanced with rotations:
- right rotations upon insertion at the front or upon removal from the back, and
- left rotations upon removal from the front or insertion at the back.
Eric Lippert is, in my opinion, a very smart person whom I deeply respect, yet he apparently didn't consider this approach. Thus I wonder, was it for a good reason? Is my suggested way of implementing deques naïve?
发布评论
评论(4)
正如 Daniel 所指出的,使用众所周知的平衡搜索树(如 AVL 或红黑树)实现不可变双端队列会带来 θ(lg n) 最坏情况的复杂性。 Lippert 讨论的一些实现乍一看似乎很复杂,但有许多具有 o(lg n) 最坏或平均或摊销复杂度的不可变双端队列,它们是从平衡树以及两个简单的想法构建的:
要在传统的平衡搜索树上执行双端队列操作,我们需要访问末端,但只能访问中心。为了到达左端,我们必须导航左子指针,直到最终到达死胡同。最好有一个指向左端和右端的指针,而不需要所有的导航工作。事实上,我们确实不需要经常访问根节点。让我们存储一棵平衡搜索树,以便访问两端的时间复杂度为 O(1)。
下面是一个用 C 语言编写的示例,说明了通常如何存储 AVL 树:
为了设置树,以便我们可以从边缘开始并向根移动,我们更改树并存储沿着从 AVL 树开始的路径的所有指针。根到反向中最左边和最右边的子节点。 (这些路径分别称为左和右脊柱)。就像反转单链表一样,最后一个元素成为第一个元素,因此现在可以轻松访问最左边的子元素。
这有点难以理解。为了帮助解释它,想象一下您只对左脊柱执行了此操作:
从某种意义上说,最左边的孩子现在是树的“根”。如果你用这种方式画树,它看起来会很奇怪,但如果你简单地按照正常的树图画方式并反转左脊柱上的所有箭头,LeftSpine 结构的含义应该会变得更清晰。现在可以立即访问树的左侧。右脊柱也可以这样做:
如果您同时存储左脊柱和右脊柱以及中心元素,则可以立即访问两端。插入和删除可能仍然是 Ω(lg n),因为重新平衡操作可能需要遍历整个左或右脊柱,但简单地查看左侧和最右侧的元素现在是 O(1)。
此策略的一个示例用于制作纯函数式处理 SML 和 Java 中的实现 (更多文档)。这也是其他几个具有 o(lg n) 性能的不可变双端队列的关键思想。
如上所述,在 AVL 树的左端或最右端插入可能需要 Ω(lg n) 时间来遍历脊柱。下面是一个 AVL 树的示例,演示了这一点:
满二叉树通过归纳法定义为:
将元素推入满二叉树的左侧必然会增加树的最大高度。由于上面的 AVL 树将信息存储在每个节点中,并且由于满二叉树的左脊柱上的每棵树也是满二叉树,因此将元素推到恰好是满二叉树的 AVL 双端队列的左侧将需要沿左脊柱递增 Ω(lg n) 高度值。
(对此有两个注意事项:(a)您可以存储 AVL 树而不保留节点中的高度;相反,您只保留平衡信息(左高、右高,甚至)。这不会改变 AVL 树的性能(b) 在 AVL 树中,您可能不仅需要进行 Ω(lg n) 平衡或高度信息更新,还需要进行 Ω(lg n) 重新平衡操作,我不记得这方面的细节了。只针对删除,而不是插入。)
为了实现 o(lg n) 双端队列操作,我们需要限制这项工作。由平衡树表示的不可变双端队列通常至少使用以下策略之一:
预测需要重新平衡的位置。如果您使用的树需要 o(lg n) 重新平衡,但您知道哪里需要重新平衡并且您可以足够快地到达那里,则可以在 o(lg n) 中执行双端队列操作时间。使用此策略的双端队列不仅会将两个指针存储到双端队列中(左右脊柱的末端,如上所述),还会将一些少量的跳转指针存储到沿双端队列更高的位置。刺。然后,双端队列操作可以在 O(1) 时间内访问跳转指针指向的树的根。如果为所有需要重新平衡(或更改节点信息)的地方维护 o(lg n) 跳转指针,则双端队列操作可能需要 o(lg n) 时间。
(当然,这使得树实际上是一棵 dag,因为跳转指针指向的树干上的树也被树干上的子树所指向。不可变的数据结构通常不与非树相处图表,因为替换多个其他节点指向的节点需要替换指向它的所有其他节点,我已经看到通过消除非跳转指针将 dag 转回一棵树即可解决此问题。然后将带有跳转指针的单链表存储为列表列表。每个从属列表包含该列表的头部与其跳转指针之间的所有节点,这需要小心处理部分重叠的跳转指针和完整的跳转指针。除此之外,解释可能不合适。)
这是 Tsakalidis 在他的论文“用于本地化搜索的 AVL 树”中使用的技巧之一 允许在宽松的平衡条件下对 AVL 树进行 O(1) 双端队列操作。这也是 Kaplan 和 Tarjan 在他们的论文《纯函数式实时双端队列》中使用的主要思想带有串联” 和 后来由 Mihaesau 和 Tarjan 对其进行了改进。不过,Munro 等人的“确定性跳过列表”也值得在此提及通过使用树将跳过列表转换为不可变设置有时会更改允许在末尾附近进行此类有效修改的属性。有关翻译示例,请参阅 Messeguer 的“跳过树,并发中跳过列表的替代数据结构”方法”,Dean 和 Jones 的“探索跳跃列表和二叉搜索树之间的对偶性” ,以及Lamoureux 和 Nickerson 的“论 B- 的等价性”树和确定性跳过列表”.
批量完成工作。在上面的完整二叉树示例中,推送时不需要重新平衡,但 Ω(lg n) 节点需要更新其高度或平衡信息。您可以简单地将两端的书脊标记为需要增量,而不是实际执行增量。
理解这个过程的一种方法是类比二进制数。 (2^n)-1 用二进制表示,由 n 个 1 组成的字符串表示。这个数字加1的时候,需要把所有的1都变成0,然后在最后加一个1。以下 Haskell 将二进制数编码为非空位字符串,最低有效位在前。
incr 是一个递归函数,对于
(One,replicate k One)
形式的数字,incr 会调用自身 Ω(k) 次。相反,我们可以仅用组中的位数来表示相等位的组。如果相邻位或位组相等(值相等,而不是数量相等),则将它们合并为一组。我们可以在 O(1) 时间内递增:
由于 segIncr 不是递归的,并且除了 Int 上的加号和减号之外不调用任何函数,因此您可以看到它需要 O(1) 时间。
上面标题为“预测需要重新平衡的位置”的部分中提到的一些双端队列实际上使用了一种不同的数字技术,称为“冗余数字系统”,将重新平衡工作限制为 O(1) 并快速找到它。冗余的数字表示很有趣,但对于本次讨论来说可能太遥远了。 Elmasry 等人的“严格正则数字系统和数据结构” 是一个不错的开始阅读该主题的地方。 Hinze 的“引导单面柔性阵列”也可能有用。
在“使数据结构持久化”中,Driscoll 等人等人。描述了延迟重着色,他们将其归因于 Tsakalidis。他们将其应用于红黑树,可以在插入或删除后通过 O(1) 旋转(但 Ω(lg n) 重新着色)重新平衡(参见 Tarjan 的“在 O(1) 轮换中更新平衡树”)。这个想法的核心是标记一条需要重新着色但不旋转的节点的大路径。旧版本的 布朗与Tarjan 的“快速合并算法”。 (同一作品的较新版本使用 2-3 棵树;我还没有读过较新的版本,我不知道他们是否使用了诸如延迟重新着色之类的技术。)
随机化。上面提到的 Treap 可以在功能设置中实现,以便它们平均在 O(1) 时间内执行双端队列操作。由于双端队列不需要检查其元素,因此该平均值不易受到恶意输入的影响而降低性能,这与简单(无重新平衡)的二叉搜索树不同,后者在平均输入上速度很快。 Treap 使用独立的随机位源,而不是依赖于数据的随机性。
在持久设置中,treaps 可能容易受到攻击者恶意输入的影响,而攻击者既可以 (a) 使用旧版本的数据结构,也可以 (b) 测量操作的性能。因为它们没有任何最坏情况下的平衡保证,所以陷阱可能会变得非常不平衡,尽管这种情况很少发生。如果对手等待需要很长时间的双端队列操作,她可以重复启动相同的操作,以测量和利用可能不平衡的树。
如果这不是一个问题,陷阱是一种非常简单的数据结构。它们与上面描述的 AVL 主干树非常接近。
上面提到的跳过列表也可能适用于具有 O(1) 平均时间双端队列操作的功能实现。
限制重新平衡工作的前两种技术需要对数据结构进行复杂的修改,同时通常提供对双端队列操作的复杂性的简单分析。随机化以及下一项技术具有更简单的数据结构,但分析更复杂。 Seidel 的原始分析阿拉贡并不是微不足道的,并且使用比上面引用的论文中更高级的数学对精确概率进行了一些复杂的分析 - 请参阅Flajolet 等人的“随机二叉搜索树中的模式”。
摊销。有几种平衡树,当从根部向上查看时(如上面“反转树干”中所述),提供 O(1) 摊销 插入和删除时间。单个操作可能需要 Ω(lg n) 时间,但它们使树处于如此良好的状态,以至于昂贵操作之后的大量操作将变得便宜。
不幸的是,当旧版本的树仍然存在时,这种分析不起作用。用户可以在几乎不平衡的旧树上多次执行操作,而无需任何干预的廉价操作。
克里斯·冈崎。解释摊销如何在使用任意旧版本的数据结构的情况下仍然存在并不容易,但如果我没记错的话,冈崎关于该主题的第一篇(据我所知)论文有一个非常清晰的解释。有关更全面的说明,请参阅 他的论文或他的书。
据我了解,有两个基本要素。首先,您不是仅仅保证在每个昂贵操作之前发生一定数量的廉价操作(通常的摊销方法),而是在之前实际指定并设置该特定昂贵操作执行廉价的操作来支付它的费用。在某些情况下,操作计划仅在许多介入的廉价步骤之后才开始(和完成)。在其他情况下,该操作实际上只在未来调度 O(1) 步骤,但便宜的操作可能会执行部分昂贵的操作,然后重新安排更多操作以供以后使用。如果对手想要一遍又一遍地重复昂贵的操作,那么实际上每次都重复使用相同的预定操作。这次分享是第二个成分的用武之地。
计算是利用惰性来设置的。惰性值不会立即计算,但一旦执行,其结果就会被保存。客户端第一次需要检查惰性值时,将计算其值。以后的客户端可以直接使用该缓存值,而无需重新计算它。
某些 ML 中包含惰性构造,并且 Haskell 默认情况下是惰性的。从本质上讲,懒惰是一种突变,这导致一些作者将其称为“副作用”。如果这种副作用不能很好地适应首先选择不可变数据结构的任何原因,那么这可能会被认为是不好的,但是,另一方面,将惰性视为副作用允许应用传统的摊销分析技术对持久性数据结构的影响,如 Kaplan、Okasaki 和 Tarjan 题为“ “简单的一致持久可连接列表”。
再次考虑上面的对手,他试图重复强制计算昂贵的操作。在惰性值的第一个力之后,剩余的每个力都是便宜的。
Okasaki 在他的书中解释了如何构建双端队列,每个操作所需的摊销时间为 O(1)。它本质上是一棵 B+ 树,所有元素都存储在叶子中,节点的子节点数量可能不同,并且每个叶子的深度相同。冈崎使用了上面讨论的脊椎反转方法,他将脊椎悬挂(即存储为惰性值)叶元素上方。
Hinze 和 Paterson 提出的名为“手指树:一种简单的通用数据结构”的结构 介于 Okasaki 设计的双端队列和 “可连接排序列表的纯函数表示形式之间”卡普兰和塔尔扬的”。 Hinze 和 Paterson 的结构变得非常流行。
作为摊销分析如何难以理解的证据,Hinze 和 Paterson 的手指树经常 已实现 没有惰性,使得时间界限不是 O(1),而是 O(lg n)。一种似乎使用惰性的实现是 功能性 dotnet 中的一个。该项目还包括 C# 中惰性值的实现,如果我上面的解释缺乏,这可能有助于解释它们。
Deques可以用作二进制树吗?是的,持续使用时它们最坏的复杂性不会比Eric Lippert提出的糟糕。但是,埃里克(Eric)的树木实际上不够复杂,无法在持久的环境中获得o(1)Deque操作,尽管只有一个小的复杂性边缘(使中心懒惰)(如果您愿意接受摊销的性能) 。假设对对手不太棘手的对手,可以在功能环境中获得不同但简单的Treap视图。在功能环境中,使用具有树状结构的O(1)最差的Deque操作需要比Eric的实现更为复杂。
两个最后的注释(尽管这是一个非常有趣的话题,我保留稍后添加更多的权利):-)
上面提到的几乎所有删除也是手指搜索树。在功能设置中,这意味着它们可以在O(LG(Min(i,ni)))的时间和两种尺寸N和M的树中分开)时间。
我知道实现不使用树木的脱口机的两种方法。 Okasaki在他的书和论文中介绍了我上面链接的论文。另一个使用一种称为“全球重建”的技术,并在 chuang and chuang and goldberg和goldberg的“实时时间”中, ,多头图灵机和纯粹的功能编程“ 。
As Daniel noted, implementing immutable deques with well-known balanced search trees like AVL or red-black trees gives Θ(lg n) worst case complexity. Some of the implementations Lippert discusses may seem elaborate at first glance, but there are many immutable deques with o(lg n) worst or average or amortized complexity that are built from balanced trees along with two simple ideas:
To perform deque operations on a traditional balanced search tree, we need access to the ends, but we only have access to the center. To get to the left end, we must navigate left child pointers until we finally reach a dead end. It would be preferable to have a pointer to the left and right ends without all that navigation effort. In fact, we really don't need access to the root node very often. Let's store a balanced search tree so that access to the ends is O(1).
Here is an example in C of how you might normally store an AVL tree:
To set up the tree so that we can start at the edges and move towards the root, we change the tree and store all of the pointers along the paths from the root to the left and rightmost children in reverse. (These paths are called the left and right spine, respectively). Just like reversing a singly-linked list, the last element becomes the first, so the leftmost child is now easily accessible.
This is a little tricky to understand. To help explain it, imagine that you only did this for the left spine:
In some sense, the leftmost child is now the "root" of the tree. If you drew the tree this way, it would look very strange, but if you simply take your normal drawing of a tree and reverse all of the arrows on the left spine, the meaning of the LeftSpine struct should become clearer. Access to the left side of the tree is now immediate. The same can be done for the right spine:
If you store both a left and a right spine as well as the center element, you have immediate access to both ends. Inserting and deleting may still be Ω(lg n), because rebalancing operations may require traversing the entire left or right spine, but simply viewing to the left and rightmost elements is now O(1).
An example of this strategy is used to make purely functional treaps with implementations in SML and Java (more documentation). This is also a key idea in several other immutable deques with o(lg n) performance.
As noted above, inserting at the left or rightmost end of an AVL tree can require Ω(lg n) time for traversing the spine. Here is an example of an AVL tree that demonstrates this:
A full binary tree is defined by induction as:
Pushing an element onto the left of a full binary tree will necessarily increase the maximum height of the tree. Since the AVL trees above store that information in each node, and since every tree along the left spine of a full binary tree is also a full binary tree, pushing an element onto the left of an AVL deque that happens to be a full binary tree will require incrementing Ω(lg n) height values along the left spine.
(Two notes on this: (a) You can store AVL trees without keeping the height in the node; instead you keep only balance information (left-taller, right-taller, or even). This doesn't change the performance of the above example. (b) In AVL trees, you might need to do not only Ω(lg n) balance or height information updates, but Ω(lg n) rebalancing operations. I don't recall the details of this, and it may be only on deletions, rather than insertions.)
In order to achieve o(lg n) deque operations, we need to limit this work. Immutable deques represented by balanced trees usually use at least one of the following strategies:
Anticipate where rebalancing will be needed. If you are using a tree that requires o(lg n) rebalancing but you know where that rebalancing will be needed and you can get there quickly enough, you can perform your deque operations in o(lg n) time. Deques that use this as a strategy will store not just two pointers into the deque (the ends of the left and right spines, as discussed above), but some small number of jump pointers to places higher along the spines. Deque operations can then access the roots of the trees pointed to by the jump pointers in O(1) time. If o(lg n) jump pointers are maintained for all of the places where rebalancing (or changing node information) will be needed, deque operations can take o(lg n) time.
(Of course, this makes the tree actually a dag, since the trees on the spines pointed to by jump pointers are also pointed to by their children on the spine. Immutable data structures don't usually get along with non-tree graphs, since replacing a node pointed to by more than one other node requires replacing all of the other nodes that point to it. I have seen this fixed by just eliminating the non-jump pointers, turning the dag back into a tree. One can then store a singly-linked list with jump pointers as a list of lists. Each subordinate list contains all of the nodes between the head of that list and its jump pointer. This requires some care to deal with partially overlapping jump pointers, and a full explanation is probably not appropriate for this aside.)
This is one of the tricks used by Tsakalidis in his paper "AVL Trees for localized search" to allow O(1) deque operations on AVL trees with a relaxed balance condition. It is also the main idea used by Kaplan and Tarjan in their paper "Purely functional, real-time deques with catenation" and a later refinement of that by Mihaesau and Tarjan. Munro et al.'s "Deterministic Skip Lists" also deserves a mention here, though translating skip lists to an immutable setting by using trees sometimes changes the properties that allow such efficient modification near the ends. For examples of the translation, see Messeguer's "Skip trees, an alternative data structure to Skip lists in a concurrent approach", Dean and Jones's "Exploring the duality between skip lists and binary search trees", and Lamoureux and Nickerson's "On the Equivalence of B-trees and deterministic skip lists".
Do the work in bulk. In the full binary tree example above, no rebalancing is needed on a push, but Ω(lg n) nodes need to have their height or balance information updated. Instead of actually doing the incrementation, you could simply mark the spine at the ends as needing incrementation.
One way to understand this process is by analogy to binary numbers. (2^n)-1 is represented in binary by a string of n 1's. When adding 1 to this number, you need to change all of the 1's to 0's and then add a 1 at the end. The following Haskell encodes binary numbers as non-empty strings of bits, least significant first.
incr is a recursive function, and for numbers of the form
(One,replicate k One)
, incr calls itself Ω(k) times.Instead, we might represent groups of equal bits by only the number of bits in the group. Neighboring bits or groups of bits are combined into one group if they are equal (in value, not in number). We can increment in O(1) time:
Since segIncr is not recursive and doesn't call any functions other than plus and minus on Ints, you can see it takes O(1) time.
Some of the deques mentioned in the section above entitled "Anticipate where rebalancing will be needed" actually use a different numerically-inspired technique called "redundant number systems" to limit the rebalancing work to O(1) and locate it quickly. Redundant numerical representations are fascinating, but possibly too far afield for this discussion. Elmasry et al.'s "Strictly-regular number system and data structures" is not a bad place to start reading about that topic. Hinze's "Bootstrapping one-sided flexible arrays" may also be useful.
In "Making data structures persistent", Driscoll et al. describe lazy recoloring, which they attribute to Tsakalidis. They apply it to red-black trees, which can be rebalanced after insertion or deletion with O(1) rotations (but Ω(lg n) recolorings) (see Tarjan's "Updataing a balanced tree in O(1) rotations"). The core of the idea is to mark a large path of nodes that need to be recolored but not rotated. A similar idea is used on AVL trees in the older versions of Brown & Tarjan's "A fast merging algorithm". (Newer versions of the same work use 2-3 trees; I have not read the newer ones and I do not know if they use any techniques like lazy recoloring.)
Randomize. Treaps, mentioned above, can be implemented in a functional setting so that they perform deque operations on O(1) time on average. Since deques do not need to inspect their elements, this average is not susceptible to malicious input degrading performance, unlike simple (no rebalancing) binary search trees, which are fast on average input. Treaps use an independent source of random bits instead of relying on randomness from the data.
In a persistent setting, treaps may be susceptible to degraded performance from malicious input with an adversary who can both (a) use old versions of a data structure and (b) measure the performance of operations. Because they do not have any worst-case balance guarantees, treaps can become quite unbalanced, though this should happen rarely. If an adversary waits for a deque operation that takes a long time, she can initiate that same operation repeatedly in order to measure and take advantage of a possibly unbalanced tree.
If this is not a concern, treaps are an attractively simple data structure. They are very close to the AVL spine tree described above.
Skip lists, mentioned above, might also be amenable to functional implementations with O(1) average-time deque operations.
The first two techniques for bounding the rebalancing work require complex modifications to data structures while usually affording a simple analysis of the complexity of deque operations. Randomization, along with the next technique, have simpler data structures but more complex analysis. The original analysis by Seidel and Aragon is not trivial, and there is some complex analysis of exact probabilities using more advanced mathematics than is present in the papers cited above -- see Flajolet et al.'s "Patterns in random binary search trees".
Amortize. There are several balanced trees that, when viewed from the roots up (as explained in "Reverse the Spines", above), offer O(1) amortized insertion and deletion time. Individual operations can take Ω(lg n) time, but they put the tree in such a nice state that a large number of operations following the expensive operation will be cheap.
Unfortunately, this kind of analysis does not work when old versions of the tree are still around. A user can perform operations on the old, nearly-out-of-balance tree many times without any intervening cheap operations.
One way to get amortized bounds in a persistent setting was invented by Chris Okasaki. It is not simple to explain how the amortization survives the ability to use arbitrary old versions of a data structure, but if I remember correctly, Okasaki's first (as far as I know) paper on the subject has a pretty clear explanation. For more comprehensive explanations, see his thesis or his book.
As I understand it, there are two essential ingredients. First, instead of just guaranteeing that a certain number of cheap operations occur before each expensive operation (the usual approach to amortization) you actually designate and set up that specific expensive operation before performing the cheap operations that will pay for it. In some cases, the operation is scheduled to be started (and finished) only after many intervening cheap steps. In other cases, the operation is actually scheduled only O(1) steps in the future, but cheap operations may do part of the expensive operation and then reschedule more of it for later. If an adversary looking to repeat an expensive operation over and over again is actually reusing the same scheduled operation each time. This sharing is where the second ingredient comes in.
The computation is set up using laziness. A lazy value is not computed immediately, but, once performed, its result is saved. The first time a client needs to inspect a lazy value, its value is computed. Later clients can use that cached value directly, without having to recompute it.
Laziness constructs are included in some MLs, and Haskell is lazy by default. Under the hood, laziness is a mutation, which leads some authors to call it a "side effect". That might be considered bad if that kind of side effect doesn't play well with whatever the reasons were for selecting an immutable data structure in the first place, but, on the other hand, thinking of laziness as a side effect allows the application of traditional amortized analysis techniques to persistent data structures, as mentioned in a paper by Kaplan, Okasaki, and Tarjan entitled "Simple Confluently Persistent Catenable Lists".
Consider again the adversary from above who is attempting to repeatedly force the computation of an expensive operation. After the first force of the lazy value, every remaining force is cheap.
In his book, Okasaki explains how to build deques with O(1) amortized time required for each operation. It is essentially a B+-tree, which is a tree where all of the elements are stored at the leaves, nodes may vary in how many children they have, and every leaf is at the same depth. Okasaki uses the spine-reversal method discussed above, and he suspends (that is, stores as a lazy value) the spines above the leaf elements.
A structure by Hinze and Paterson called "Finger trees: a simple general-purpose data structure" is halfway between the deques designed by Okasaki and the "Purely functional representations of catenable sorted lists" of Kaplan and Tarjan. Hinze and Paterson's structure has become very popular.
As a evidence of how tricky the amortized analysis is to understand, Hinze and Paterson's finger trees are frequently implemented without laziness, making the time bounds not O(1) but still O(lg n). One implementation that seems to use laziness is the one in functional-dotnet. That project also includes an implementation of lazy values in C# which might help explain them if my explanation above is lacking.
Could deques be implemented as binary trees? Yes, and their worst-case complexity when used persistently would be no worse than those presented by Eric Lippert. However, Eric's trees are actually not complicated enough to get O(1) deque operations in a persistent setting, though only by a small complexity margin (making the center lazy) if you are willing to accept amortized performance. A different but also simple view of treaps can get O(1) expected performance in a functional setting, assuming an adversary who is not too tricky. Getting O(1) worst-case deque operations with a tree-like structure in a functional setting requires a good bit more complexity than Eric's implementations.
Two final notes (though this is a very interesting topic and I reserve the right to add more later) :-)
Nearly all of the deques mentioned above are finger search trees as well. In a functional setting this means they can be split at the ith element in O(lg(min(i,n-i))) time and two trees of size n and m can be concatenated in O(lg(min(n,m))) time.
I know of two ways of implementing deques that don't use trees. Okasaki presents one in his book and thesis and the paper I linked to above. The other uses a technique called "global rebuilding" and is presented in Chuang and Goldberg's "Real-time deques, multihead Turing machines, and purely functional programming".
如果使用平衡二叉树,两端的插入和删除都是 O(lg N)(平均情况和最坏情况)。
Eric Lippert 实现中使用的方法更加高效,在平均情况下以恒定时间运行(最坏情况仍然是 O(lg N))。
请记住,修改不可变树涉及重写您正在修改的节点的所有父节点。因此,对于双端队列,您不希望树是平衡的;相反,您希望 L 和 R 节点尽可能靠近根,而树中间的节点可以更远。
If you use a balanced binary tree, insertions and removals on both ends are O(lg N) (both average and worst case).
The approach used in Eric Lippert's implementations is more efficient, running in constant time in the average case (the worst case still is O(lg N)).
Remember that modifying an immutable tree involves rewriting all parents of the node you are modifying. So for a deque, you do not want the tree to be balanced; instead you want the L and R nodes to be as close to the root as possible, whereas nodes in the middle of the tree can be further away.
其他答案都很棒。我将向他们补充一点,我选择双端队列的手指树实现是因为它对通用类型系统进行了不寻常且有趣的使用。大多数数据结构的结构都是递归的,但这种技术也将递归置于我以前从未见过的类型系统中;我认为这可能会引起普遍兴趣。
The other answers are all awesome. I will add to them that I chose the finger tree implementation of a deque because it makes an unusual and interesting use of the generic type system. Most data structures are recursive in their structure, but this technique puts the recursion also in the type system which I had not seen before; I thought it might be of general interest.
绝对地。高度平衡树的修改版本,特别是 AVL 树,将非常容易实现。然而,这意味着用
n
个元素填充基于树的队列需要O(n lg n)
时间——您应该寻找一个与可变对应物具有相似性能特征的双端队列。您可以使用两个堆栈(左堆栈和右堆栈)为所有主要操作创建一个简单的不可变双端队列,其中包含分摊常数时间操作。 PushLeft 和 PushRight 分别对应于将值压入左侧堆栈和右侧堆栈。可以从LeftStack.Hd和LeftStack.Tl中获取Deque.Hd和Deque.Tl;如果您的 LeftStack 为空,请设置 LeftStack = RightStack.Reverse 和 RightStack = 空堆栈。
这是一种非常常见的实现,并且很容易优化以获得更好的性能。
Absolutely. A modified version of a height-balanced tree, AVL trees in particular, would be very easy to implement. However it means filling tree-based queue with
n
elements requiresO(n lg n)
time -- you should shoot for a deque which has similar performance characteristics as the mutable counterpart.You can create a straightforward immutable deque with amortized constant time operations for all major operations using two stacks, a left and right stack. PushLeft and PushRight correspond to pushing values on the left and right stack respectively. You can get Deque.Hd and Deque.Tl from the LeftStack.Hd and LeftStack.Tl; if your LeftStack is empty, set LeftStack = RightStack.Reverse and RightStack = empty stack.
This is a very common implementation, and its very easy to optimize for better performance.