跳跃列表与二叉搜索树

舟遥客 2024-07-16 06:22:55

跳过列表更适合并发访问/修改。 Herb Sutter 撰写了一篇关于并发环境中的数据结构的文章。它有更深入的信息。

二叉搜索树最常用的实现是红黑树。当树被修改时，并发问题就会出现，它通常需要重新平衡。重新平衡操作可能会影响树的大部分，这需要在许多树节点上使用互斥锁。将节点插入到跳跃列表中的操作更加本地化，只有直接链接到受影响节点的节点才需要锁定。

Jon Harrops 评论的更新

我读了 Fraser 和 Harris 的最新论文没有并发编程锁。如果您对无锁数据结构感兴趣，这确实是个好东西。本文重点介绍事务内存和多字比较和交换 MCAS 的理论操作。这两者都是在软件中模拟的，因为尚无硬件支持它们。他们能够用软件构建 MCAS，这给我留下了深刻的印象。

我没有发现事务内存的东西特别引人注目，因为它需要垃圾收集器。此外，软件事务内存也受到性能问题的困扰。然而，如果硬件事务内存变得普遍，我会非常兴奋。最终它仍然处于研究阶段，并且在接下来的十年左右的时间内不会用于生产代码。

在第 8.2 节中，他们比较了几种并发树实现的性能。我将总结他们的发现。下载 pdf 版本是值得的，因为它在第 50、53 和 54 页上有一些信息非常丰富的图表。

锁定跳过列表速度非常快。它们随着并发访问的数量而扩展得非常好。这就是跳跃列表的特殊之处，其他基于锁的数据结构在压力下往往会崩溃。
无锁跳过列表始终比锁定跳过列表快，但也只是勉强。
事务性跳过列表始终比锁定和非锁定版本慢 2-3 倍。
锁定红黑树在并发访问下会发出嘎嘎声。它们的性能随着每个新并发用户的增加而线性下降。在两种已知的锁定红黑树实现中，一种本质上在树重新平衡期间具有全局锁定。另一种使用奇特（且复杂）的锁升级，但仍然没有明显优于全局锁版本。
无锁红黑树不存在（不再正确，请参阅更新）。
事务性红黑树与事务性跳过列表相当。这是非常令人惊讶和非常有希望的。事务性内存，虽然更容易编写，但速度较慢。在非并发版本上可以像快速搜索和替换一样简单。

更新
这是关于无锁树的论文：使用 CAS 的无锁红黑树。
我没有深入研究过，但表面上看起来很可靠。

Skip lists are more amenable to concurrent access/modification. Herb Sutter wrote an article about data structure in concurrent environments. It has more indepth information.

The most frequently used implementation of a binary search tree is a red-black tree. The concurrent problems come in when the tree is modified it often needs to rebalance. The rebalance operation can affect large portions of the tree, which would require a mutex lock on many of the tree nodes. Inserting a node into a skip list is far more localized, only nodes directly linked to the affected node need to be locked.

Update from Jon Harrops comments

I read Fraser and Harris's latest paper Concurrent programming without locks. Really good stuff if you're interested in lock-free data structures. The paper focuses on Transactional Memory and a theoretical operation multiword-compare-and-swap MCAS. Both of these are simulated in software as no hardware supports them yet. I'm fairly impressed that they were able to build MCAS in software at all.

I didn't find the transactional memory stuff particularly compelling as it requires a garbage collector. Also software transactional memory is plagued with performance issues. However, I'd be very excited if hardware transactional memory ever becomes common. In the end it's still research and won't be of use for production code for another decade or so.

In section 8.2 they compare the performance of several concurrent tree implementations. I'll summarize their findings. It's worth it to download the pdf as it has some very informative graphs on pages 50, 53, and 54.

Locking skip lists is insanely fast. They scale incredibly well with the number of concurrent accesses. This is what makes skip lists special, other lock based data structures tend to croak under pressure.
Lock-free skip lists are consistently faster than locking skip lists but only barely.
transactional skip lists are consistently 2-3 times slower than the locking and non-locking versions.
locking red-black trees croak under concurrent access. Their performance degrades linearly with each new concurrent user. Of the two known locking red-black tree implementations, one essentially has a global lock during tree rebalancing. The other uses fancy (and complicated) lock escalation but still doesn't significantly outperform the global lock version.
lock-free red-black trees don't exist (no longer true, see Update).
transactional red-black trees are comparable with transactional skip-lists. That was very surprising and very promising. Transactional memory, though slower if far easier to write. It can be as easy as quick search and replace on the non-concurrent version.

Update
Here is paper about lock-free trees: Lock-Free Red-Black Trees Using CAS.
I haven't looked into it deeply, but on the surface it seems solid.

回复收藏 0 原文

绝不放开 2024-07-16 06:22:55

首先，您无法公平地将随机数据结构与为您提供最坏情况保证的数据结构进行比较。

跳跃列表相当于随机平衡二叉搜索树 (RBST)，其方式在 Dean 和 Jones 的 “探索跳跃列表和二叉搜索树之间的对偶性”。

相反，您也可以拥有确定性的跳过列表，以保证最坏情况下的性能，参见。 Munro 等人

与上面的一些说法相反，您可以实现在并发编程中运行良好的二叉搜索树 (BST)。以并发为中心的 BST 的一个潜在问题是，您无法轻松获得与红黑 (RB) 树相同的平衡保证。（但是“标准”，即 randomzided，跳过列表也不会给您这些保证。）始终保持平衡和良好（且易于编程）并发访问之间存在权衡，因此放松 RB树通常在需要良好并发性时使用。放松在于不立即重新平衡树。有关有点过时的 (1998) 调查，请参阅 Hanke 的“并行红黑树算法的性能”[ps.gz]。

最近的改进之一是所谓的色树（基本上，您有一些权重，例如黑色为 1，红色为 0，但您也允许介于两者之间的值）。彩色树如何对抗跳跃列表？让我们看看布朗等人怎么说。 “无阻塞树的通用技术” (2014)说：

在 128 个线程中，我们的算法优于 Java 的非阻塞跳表
Bronson 等人的基于锁的 AVL 树提高了 13% 到 156%。减少 63% 到 224%，而使用软件事务内存 (STM) 的 RBT 增加 13 到 134 倍

编辑添加：Pugh 基于锁的跳过列表，在 Fraser 和 Harris (2007) 中进行了基准测试 "无锁并发编程"接近他们自己的锁-free 版本（这里的最佳答案中充分强调了这一点），也针对良好的并发操作进行了调整，参见。 Pugh 的“跳跃列表的并发维护”，尽管是以一种相当温和的方式。尽管如此，2009 年的一篇较新的论文“一种简单的乐观跳过列表算法” Herlihy 等人提出了一种据称更简单（比 Pugh 的）基于锁的并发跳跃列表实现，批评 Pugh 没有提供足够令人信服的正确性证明。抛开这个（也许太迂腐）的疑虑，Herlihy 等人。表明他们更简单的基于锁的跳跃列表实现实际上无法像 JDK 的无锁实现一样扩展，但仅适用于高争用（50% 插入、50% 删除和 0% 查找）... Fraser哈里斯根本没有进行测试； Fraser 和 Harris 仅测试了 75% 的查找、12.5% 的插入和 12.5% 的删除（在具有约 500K 元素的跳过列表上）。 Herlihy 等人的更简单的实现。在他们测试的低争用情况下（70% 查找、20% 插入、10% 删除），也接近 JDK 的无锁解决方案；当他们使跳跃列表足够大时，即从 200K 元素变为 2M 元素，从而使任何锁争用的可能性都可以忽略不计，他们实际上击败了这种情况下的无锁解决方案。如果 Herlihy 等人能够做到这一点，那就太好了。已经克服了 Pugh 证明的困扰，并测试了他的实现，但遗憾的是他们没有这样做。

编辑2：我找到了所有基准的（2015年发布）母脉：Gramoli的"更多您想知道的有关同步的信息。Synchrobench，测量同步对并发算法的影响”：这是与此问题相关的摘录图像。

在此处输入图像描述

“Algo.4”是 Brown 等人提到的前体（旧版，2011 年版本）多于。（我不知道2014年的版本好还是差多少）。 “Algo.26”是上面提到的Herlihy的；正如您所看到的，它在更新时被丢弃，并且在此处使用的 Intel CPU 上比在原始论文中的 Sun CPU 上更糟糕。 “Algo.28”是来自JDK的ConcurrentSkipListMap；与其他基于 CAS 的跳跃列表实现相比，它的效果并不如人们所希望的那么好。高竞争下的获胜者是“Algo.2”，一种由 Crain 等人描述的基于锁的算法 (!!)。在“竞争友好的二叉搜索树” “Algo.30”是来自“对数数据结构”的“旋转跳过列表”为了
多核”。“Algo.29”是 "无热点非阻塞跳过
列表”。请注意，Gramoli 是所有三篇获胜算法论文的合著者。“Algo.27”是 Fraser 跳过列表的 C++ 实现。Gramoli

的结论是，这更容易搞砸。基于 CAS 的并发树实现，而不是搞砸了类似的跳跃列表，并且根据这些数字，很难不同意他对这一事实的解释：

设计无锁树的困难源于
以原子方式修改多个引用的困难。跳过列表
由通过后继指针相互连接的塔组成
其中每个节点都指向紧接其下的节点。他们是
通常被认为类似于树，因为每个节点都有一个后继节点
然而，在后续塔及其下方，一个主要区别是
向下指针通常是不可变的，因此简化
节点的原子修改。这个区别大概是
跳跃列表在激烈竞争下优于树的原因
如图[上图]所示。

克服这一困难是 Brown 等人最近工作的一个主要关注点。
他们有一篇完整独立的（2013）论文“非阻塞数据结构的实用基元” ”
构建多记录 LL/SC 复合“原语”，他们称之为 LLX/SCX，它们本身使用（机器级）CAS 实现。布朗等人。在 2014 年（但不是在 2011 年）并发树实现中使用了这个 LLX/SCX 构建块。

我认为也许也值得在这里总结一下基本思想
“无热点”/竞争友好 (CF) 跳过列表。它采用了宽松的 RB 树（以及类似的并发性数据结构）的一个基本思想：塔不再在插入时立即建立，而是延迟到争用较少为止。相反，删除一座高塔可能会引起许多争论；
早在 Pugh 1990 年的并发跳跃列表论文中就观察到了这一点，这就是为什么 Pugh 在删除时引入了指针反转（唉，维基百科关于跳跃列表的页面至今仍未提及这一花絮）。 CF 跳过列表更进一步，延迟删除高塔的上层。 CF 跳过列表中的两种延迟操作都是由（基于 CAS 的）独立的类似垃圾收集器的线程执行的，其作者将其称为“适配线程”。

Synchrobench 代码（包括所有测试的算法）可在以下位置获取：https://github.com/gramoli/synchrobench。
最新的布朗等人。实现（未包含在上面）可在 http://www.cs .toronto.edu/~tabrown/chromatic/ConcurrentChromaticTreeMap.java 有人有32核以上的机器吗？ J/K 我的观点是你可以自己运行这些。

First, you cannot fairly compare a randomized data structure with one that gives you worst-case guarantees.

A skip list is equivalent to a randomly balanced binary search tree (RBST) in the way that is explained in more detail in Dean and Jones' "Exploring the Duality Between Skip Lists and Binary Search Trees".

The other way around, you can also have deterministic skip lists which guarantee worst case performance, cf. Munro et al.

Contra to what some claim above, you can have implementations of binary search trees (BST) that work well in concurrent programming. A potential problem with the concurrency-focused BSTs is that you can't easily get the same had guarantees about balancing as you would from a red-black (RB) tree. (But "standard", i.e. randomzided, skip lists don't give you these guarantees either.) There's a trade-off between maintaining balancing at all times and good (and easy to program) concurrent access, so relaxed RB trees are usually used when good concurrency is desired. The relaxation consists in not re-balancing the tree right away. For a somewhat dated (1998) survey see Hanke's ''The Performance of Concurrent Red-Black Tree Algorithms'' [ps.gz].

One of the more recent improvements on these is the so-called chromatic tree (basically you have some weight such that black would be 1 and red would be zero, but you also allow values in between). And how does a chromatic tree fare against skip list? Let's see what Brown et al. "A General Technique for Non-blocking Trees" (2014) have to say:

with 128 threads, our algorithm outperforms Java’s non-blocking skiplist
by 13% to 156%, the lock-based AVL tree of Bronson et al. by 63% to 224%, and a RBT that uses software transactional memory (STM) by 13 to 134 times

EDIT to add: Pugh's lock-based skip list, which was benchmarked in Fraser and Harris (2007) "Concurrent Programming Without Lock" as coming close to their own lock-free version (a point amply insisted upon in the top answer here), is also tweaked for good concurrent operation, cf. Pugh's "Concurrent Maintenance of Skip Lists", although in a rather mild way. Nevertheless one newer/2009 paper "A Simple Optimistic skip-list Algorithm" by Herlihy et al., which proposes a supposedly simpler (than Pugh's) lock-based implementation of concurrent skip lists, criticized Pugh for not providing a proof of correctness convincing enough for them. Leaving aside this (maybe too pedantic) qualm, Herlihy et al. show that their simpler lock-based implementation of a skip list actually fails to scale as well as the JDK's lock-free implementation thereof, but only for high contention (50% inserts, 50% deletes and 0% lookups)... which Fraser and Harris didn't test at all; Fraser and Harris only tested 75% lookups, 12.5% inserts and 12.5% deletes (on skip list with ~500K elements). The simpler implementation of Herlihy et al. also comes close to the lock-free solution from the JDK in the case of low contention that they tested (70% lookups, 20% inserts, 10% deletes); they actually beat the lock-free solution for this scenario when they made their skip list big enough, i.e. going from 200K to 2M elements, so that the probability of contention on any lock became negligible. It would have been nice if Herlihy et al. had gotten over their hangup over Pugh's proof and tested his implementation too, but alas they didn't do that.

EDIT2: I found a (2015 published) motherlode of all benchmarks: Gramoli's "More Than You Ever Wanted to Know about Synchronization. Synchrobench, Measuring the Impact of the Synchronization on Concurrent Algorithms": Here's a an excerpted image relevant to this question.

enter image description here

"Algo.4" is a precursor (older, 2011 version) of Brown et al.'s mentioned above. (I don't know how much better or worse the 2014 version is). "Algo.26" is Herlihy's mentioned above; as you can see it gets trashed on updates, and much worse on the Intel CPUs used here than on the Sun CPUs from the original paper. "Algo.28" is ConcurrentSkipListMap from the JDK; it doesn't do as well as one might have hoped compared to other CAS-based skip list implementations. The winners under high-contention are "Algo.2" a lock-based algorithm (!!) described by Crain et al. in "A Contention-Friendly Binary Search Tree" and "Algo.30" is the "rotating skiplist" from "Logarithmic data structures for
multicores". "Algo.29" is the "No hot spot non-blocking skip
list". Be advised that Gramoli is a co-author to all three of these winner-algorithm papers. "Algo.27" is the C++ implementation of Fraser's skip list.

Gramoli's conclusion is that's much easier to screw-up a CAS-based concurrent tree implementation than it is to screw up a similar skip list. And based on the figures, it's hard to disagree. His explanation for this fact is:

The difficulty in designing a tree that is lock-free stems from
the difficulty of modifying multiple references atomically. Skip lists
consist of towers linked to each other through successor pointers and
in which each node points to the node immediately below it. They are
often considered similar to trees because each node has a successor
in the successor tower and below it, however, a major distinction is
that the downward pointer is generally immutable hence simplifying
the atomic modification of a node. This distinction is probably
the reason why skip lists outperform trees under heavy contention
as observed in Figure [above].

Overriding this difficulty was a key concern in Brown et al.'s recent work.
They have a whole separate (2013) paper "Pragmatic Primitives for Non-blocking Data Structures"
on building multi-record LL/SC compound "primitives", which they call LLX/SCX, themselves implemented using (machine-level) CAS. Brown et al. used this LLX/SCX building block in their 2014 (but not in their 2011) concurrent tree implementation.

I think it's perhaps also worth summarizing here the fundamental ideas
of the "no hot spot"/contention-friendly (CF) skip list. It addapts an essential idea from the relaxed RB trees (and similar concrrency friedly data structures): the towers are no longer built up immediately upon insertion, but delayed until there's less contention. Conversely, the deletion of a tall tower can create many contentions;
this was observed as far back as Pugh's 1990 concurrent skip-list paper, which is why Pugh introduced pointer reversal on deletion (a tidbit that Wikipedia's page on skip lists still doesn't mention to this day, alas). The CF skip list takes this a step further and delays deleting the upper levels of a tall tower. Both kinds of delayed operations in CF skip lists are carried out by a (CAS based) separate garbage-collector-like thread, which its authors call the "adapting thread".

The Synchrobench code (including all algorithms tested) is available at: https://github.com/gramoli/synchrobench.
The latest Brown et al. implementation (not included in the above) is available at http://www.cs.toronto.edu/~tabrown/chromatic/ConcurrentChromaticTreeMap.java Does anyone have a 32+ core machine available? J/K My point is that you can run these yourselves.

回复收藏 0 原文

简单气质女生网名 2024-07-16 06:22:55

此外，除了给出的答案之外（易于实现以及与平衡树相当的性能）。我发现实现有序遍历（向前和向后）要简单得多，因为跳跃列表在其实现中实际上有一个链接列表。

回复收藏 0 原文

金橙橙 2024-07-16 06:22:55

在实践中，我发现我的项目中的 B 树性能比跳跃列表更好。跳过列表看起来确实更容易理解，但实现 B 树并不那么难。

据我所知，一个优点是一些聪明的人已经研究出了如何实现仅使用原子操作的无锁并发跳过列表。例如，Java 6包含ConcurrentSkipListMap类，如果你疯了，你可以阅读它的源代码。

但是编写并发 B 树变体也不是太难 - 我见过其他人这样做过 - 如果您在沿着树走下去时“以防万一”抢先分割和合并节点，那么您就不必这样做担心死锁，并且一次只需要在树的两层上持有锁。同步开销会稍高一些，但 B 树可能更快。

回复收藏 0 原文

时光倒影 2024-07-16 06:22:55

从您引用的维基百科文章中：

θ(n) 操作迫使我们按升序访问每个节点（例如打印整个列表），这提供了对跳过列表的级别结构进行幕后去随机化的机会最佳方式，使跳过列表的搜索时间为 O(log n)。 [...]
跳过列表，我们还没有
最近执行了[任何此类] θ(n) 操作，没有
提供相同的绝对最坏情况
性能保证更多
传统平衡树数据
结构，因为它总是
可能（尽管非常低
概率）所使用的抛硬币
构建跳过列表将产生一个
结构不平衡

编辑：所以这是一个权衡：跳过列表使用较少的内存，但存在可能退化为不平衡树的风险。

回复收藏 0 原文

┼── 2024-07-16 06:22:55

跳过列表是使用列表来实现的。

对于单链表和双链表存在无锁解决方案 - 但不存在直接对任何 O(logn) 数据结构仅使用 CAS 的无锁解决方案。

但是，您可以使用基于 CAS 的列表来创建跳过列表。

（请注意，使用 CAS 创建的 MCAS 允许任意数据结构，并且已使用 MCAS 创建了概念证明红黑树）。

因此，尽管它们很奇怪，但事实证明它们非常有用:-)

回复收藏 0 原文

深居我梦 2024-07-16 06:22:55

跳过列表确实具有解除锁的优点。但是，运行时间取决于如何确定新节点的级别。通常这是使用 Random() 完成的。在 56000 个单词的字典中，跳跃列表比展开树花费更多时间，而树比哈希表花费更多时间。前两个无法匹配哈希表的运行时间。此外，哈希表的数组也可以以并发方式进行锁剥离。

当需要引用位置时，使用跳过列表和类似的有序列表。例如：查找应用程序中某个日期之后和之前的航班。

内存中的二分搜索展开树很棒并且更常用。

跳过列表与展开树与哈希表运行时字典查找操作

回复收藏 0 原文

跳跃列表与二叉搜索树

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

关于作者

相关话题

热门标签

推荐作者

燃烧我的卡路李先生

qq_2gSKZM

∞梦里开花

qq_IklFPL

迷途知返

深海不蓝

友情链接

跳跃列表与二叉搜索树

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

关于作者

相关话题

热门标签

推荐作者

燃烧我的卡路李先生

qq_2gSKZM

∞梦里开花

qq_IklFPL

迷途知返

深海不蓝

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。