制作 C++ 的最佳策略哈希表，线程安全

发布于 2024-11-30 10:53:40 字数 410 浏览 0 评论 0原文

（我对实现的设计感兴趣，而不是一个可以完成这一切的现成结构。）

假设我们有一个 HashTable 类（不是作为树实现的哈希映射，而是作为哈希表实现）并说有八个线程。假设读写比率约为 100:1 甚至更好 1000:1。情况 A）只有一个线程是写入者，其他线程（包括写入者）可以从 HashTable 中读取（它们可能只是迭代整个哈希表）情况 B) 所有线程都是相同的并且都可以读/写。

有人可以建议最好的策略来使类线程安全并考虑以下因素 1. 最少锁争用的最高优先级 2. 锁数量最少的第二优先级

到目前为止我的理解是：一个大读写锁（信号量）。专门化信号量，以便案例 B 可以有八个实例写入器资源，其中每个写入器资源锁定一行（或范围）。（所以我猜是 1+8 互斥体）

请告诉我我的想法是否正确，以及我们如何改进这个解决方案。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

尾戒 2024-12-07 10:53:40

对于如此高的读/写比率，您应该考虑无锁解决方案，例如 nbds。

编辑：

一般来说，无锁算法的工作原理如下：

安排您的数据结构，以便对于您打算支持的每个函数，都有一个点，您可以在一个原子操作中确定其结果是否有效（即其他线程自从读取输入后就没有改变它的输入）并提交给它们；除非您提交，否则不会更改其他线程可见的状态。这将涉及利用特定于平台的功能，例如 Win32 的原子比较和交换或 Cell 的缓存行保留操作码。
每个支持的函数都成为一个循环，重复读取输入并尝试执行工作，直到提交成功。

在争用非常低的情况下，这在性能上优于锁定算法，因为函数大多在第一次就成功，而不会产生获取锁的开销。随着争论的增加，收益变得更加可疑。

通常，可以原子操作的数据量很小 - 32 或 64 位很常见 - 因此对于涉及多次读取和写入的函数，生成的算法变得复杂并且可能很难推理。因此，最好为您的问题寻找并采用成熟的、经过充分测试且易于理解的第三方无锁解决方案，而不是自行解决。

哈希表的实现细节将取决于哈希和表设计的各个方面。我们期望能够扩大桌子吗？如果是这样，我们需要一种方法将大量数据从旧表安全地复制到新表中。我们预计会发生哈希冲突吗？如果是这样，我们需要某种方式来处理碰撞数据。我们如何确保另一个线程不会删除返回它的查找与使用它的调用者之间的键/值对？也许是某种形式的引用计数？ - 但谁拥有参考文献？ - 或者只是复制查找时的值？ - 但如果值很大怎么办？

无锁堆栈很好理解，并且实现起来相对简单（从堆栈中删除一个项目，获取当前顶部，尝试用下一个指针替换它，直到成功，返回它；要添加一个项目，获取当前顶部并将其设置为该项目的下一个指针，直到在具有保留/条件写入语义的体系结构上成功编写指向该项目的指针作为新顶部为止，这就足够了，在仅支持 CAS 的体系结构上，您需要附加一个随机数或版本号以原子方式操纵数据以避免ABA问题）。它们是以原子无锁的方式跟踪键/数据的可用空间的一种方法，允许您将键/值对（实际存储在哈希表条目中的数据）减少到一个或两个指针/偏移量，一个小的指针/偏移量。足够的数量可以使用架构的原子指令进行操作。还有其他的。

然后，读取就变成查找条目、根据请求的键检查 kvp、尽一切努力确保该值在我们返回时保持有效（获取副本/增加其引用计数）、检查条目是否已存在。自我们开始读取以来没有被修改，如果是，则返回值，撤消任何引用计数更改，如果不是，则重复读取。
写入将取决于我们对冲突所做的处理；在简单的情况下，它们只是找到正确的空槽并写入新的 kvp 的情况。
上面的内容已经大大简化，不足以生成您自己的安全实现，特别是如果您不熟悉无锁/无等待技术的话。可能的复杂情况包括 ABA 问题、优先级反转、特定线程饥饿；我还没有解决哈希冲突。

nbds 页面链接到关于允许增长的现实世界方法的精彩演示 /碰撞。其他的也存在，谷歌一下就能找到很多论文。

无锁和无等待算法是令人着迷的研究领域；我鼓励读者谷歌一下。也就是说，天真的无锁实现很容易看起来合理并且在大多数时候表现正确，但实际上却有些不安全。虽然牢牢掌握这些原则很重要，但我强烈建议使用现有的、易于理解且经过验证的实现，而不是自行实施。

With such high read/write ratios, you should consider a lock free solution, e.g. nbds.

EDIT:

In general, lock free algorithms work as follows:

arrange your data structures such that for each function you intend to support there is a point at which you are able to, in one atomic operation, determine whether its results are valid (i.e. other threads have not mutated its inputs since they have been read) and commit to them; with no changes to state visible to other threads unless you commit. This will involve leveraging platform-specific functions such as Win32's atomic compare-and-swap or Cell's cache line reservation opcodes.
each supported function becomes a loop that repeatedly reads the inputs and attempts to perform the work, until the commit succeeds.

In cases of very low contention, this is a performance win over locking algorithms since functions mostly succeed the first time through without incurring the overhead of acquiring a lock. As contention increases, the gains become more dubious.

Typically the amount of data it is possible to atomically manipulate is small - 32 or 64 bits is common - so for functions involving many reads and writes, the resulting algorithms become complex and potentially very difficult to reason about. For this reason, it is preferable to look for and adopt a mature, well-tested and well-understood third party lock free solution for your problem in preference to rolling your own.

Hashtable implementation details will depend on various aspects of the hash and table design. Do we expect to be able to grow the table? If so, we need a way to copy bulk data from the old table into the new safely. Do we expect hash collisions? If so, we need some way of walking colliding data. How do we make sure another thread doesn't delete a key/value pair between a lookup returning it and the caller making use of it? Some form of reference counting, perhaps? - but who owns the reference? - or simply copying the value on lookup? - but what if values are large?

Lock-free stacks are well understood and relatively straightforward to implement (to remove an item from the stack, get the current top, attempt to replace it with its next pointer until you succeed, return it; to add an item, get the current top and set it as the item's next pointer, until you succeed in writing a pointer to the item as the new top; on architectures with reserve/conditional write semantics, this is enough, on architectures only supporting CAS you need to append a nonce or version number to the atomically manipulated data to avoid the ABA problem). They are one way of keeping track of free space for keys/data in an atomic lock free manner, allowing you to reduce a key/value pair - the data actually stored in a hashtable entry - to a pointer/offset or two, a small enough amount to be manipulated using your architecture's atomic instructions. There are others.

Reads then become a case of looking up the entry, checking the kvp against the requested key, doing whatever it takes to make sure the value will remain valid when we return it (taking a copy / increasing its reference count), checking the entry hasn't been modified since we began the read, returning the value if so, undoing any reference count changes and repeating the read if not.
Writes will depend on what we're doing about collisions; in the trivial case, they are simply a case of finding the correct empty slot and writing the new kvp.
The above is greatly simplified and insufficient to produce your own safe implementation, especially if you are not familiar with lock-free/wait-free techniques. Possible complications include the ABA problem, priority inversion, starvation of particular threads; I have not addressed hash collisions.

The nbds page links to an excellent presentation on a real world approach that allows growth / collisions. Others exist, a quick Google finds lots of papers.

Lock free and wait free algorithms are fascinating areas of research; I encourage the reader to Google around. That said, naive lock free implementations can easily look reasonable and behave correctly much of the time while in reality being subtly unsafe. While it is important to have a solid grasp on the principles, I strongly recommend using an existing, well-understood and proven implementation over rolling your own.

回复收藏 0 原文

宛菡 2024-12-07 10:53:40

您可能想查看 Java 的 ConcurrentHashMap< /a> 实现一种可能的实现。

基本思想是不锁定每个读取操作，而仅锁定写入操作。由于在您的采访中他们特别提到了极高的读：写比率，因此尝试将尽可能多的开销填充到写入中是有意义的。

ConcurrentHashMap 将哈希表划分为所谓的“段”，这些“段”本身就是并发可读的哈希表，并使每个段保持一致状态，以允许在不锁定的情况下进行遍历。

读取时，您基本上拥有通常的 hashmap get() ，不同之处在于您必须担心读取过时的值，因此诸如正确节点的值、段表的第一个节点和下一个指针之类的内容必须是易失性的（使用c++ 不存在的内存模型你可能无法移植；c++0x 应该在这里有所帮助，但到目前为止还没有看过它）。

当将新元素放入其中时，您会得到所有开销，首先必须锁定给定的段。锁定后，它基本上是一个常见的 put() 操作，但是在更新节点的下一个指针（指向新创建的节点，其下一个指针必须已经正确指向旧的下一个节点）或覆盖时，必须保证原子写入一个节点的值。

当增长段时，您必须重新散列现有节点并将它们放入新的更大的表中。重要的部分是克隆新表的节点，以免影响旧表（通过过早更改其下一个指针），直到新表完成并替换旧表（他们在那里使用了一些聪明的技巧，这意味着他们只有克隆大约 1/6 的节点 - 很好，但我不太确定它们是如何达到这个数字的）。
请注意，垃圾收集使这变得更加容易，因为您不必担心未重用的旧节点 - 一旦所有读取器完成，它们将自动被 GC 处理。虽然这是可以解决的，但我不确定最好的方法是什么。

我希望基本思想有点清楚 - 显然有几个点没有轻易移植到 c++，但它应该给你一个好主意。

You may want to look at Java's ConcurrentHashMap implementation for one possible implementation.

The basic idea is NOT to lock for every read operation but only for writes. Since in your interview they specifically mentioned an extremely high read:write ratio it makes sense trying to stuff as much overhead as possible into writes.

The ConcurrentHashMap divides the hashtable into so called "Segments" that are themselves concurrently readable hashtables and keep every single segment in a consistent state to allow traversing without locking.

When reading you basically have the usual hashmap get() with the difference that you have to worry about reading stale values, so things like the value of the correct node, the first node of the segment table and next pointers have to be volatile (with c++'s non-existent memory model you probably can't do this portably; c++0x should help here, but haven't looked at it so far).

When putting a new element in there you get all the overhead, first of all having to lock the given segment. After locking it's basically a usual put() operation, but you have to guarantee atomic writes when updating the next pointer of a node (pointing to the newly created node whose next pointer has to be already correctly pointing to the old next node) or overwriting the value of a node.

When growing the segment, you have to rehash the existing nodes and put them into the new, larger table. The important part is to clone nodes for the new table as not to influence the old table (by changing their next pointers too early) until the new table is complete and replaces the old one (they use some clever trick there that means they only have to clone about 1/6 of the nodes - nice that but I'm not really sure how they reach that number).
Note that garbage collection makes this a whole lot easier because you don't have to worry about the old nodes that weren't reused - as soon as all readers are finished they will automatically be GCed. That's solvable though, but I'm not sure what the best approach would be.

I hope the basic idea is somewhat clear - obviously there are several points that aren't trivially ported to c++, but it should give you a good idea.

回复收藏 0 原文