同步访问双向链表
我正在尝试在 pthreads 环境中用 C 实现一个(特殊类型的)双向链表,但仅使用 C 包装的同步指令(如原子 CAS 等)而不是 pthread 原语。 (列表的元素是固定大小的内存块,几乎肯定不能容纳 pthread_mutex_t
等。)我实际上不需要完整的任意双向链表方法,只需要:
- 插入 中任意点的列表末尾
- 从列表开头删除列表
- 根据指向要删除的成员的指针,
,该指针是从除遍历列表之外的源获取的。因此,也许描述此数据结构的更好方法是队列/fifo,并且可以删除队列中的项目。
是否有一个标准方法来同步这个?我陷入了可能的死锁问题,其中一些问题可能是所涉及的算法所固有的,而另一些问题可能源于这样一个事实:我试图在一个有限的空间中工作,而我能做的事情受到其他限制。
编辑:特别是,如果要同时删除相邻对象,我该怎么办?据推测,在删除对象时,您需要获取列表中前一个和下一个对象的锁,并更新它们的 next/prev 指针以指向彼此。但如果任何一个邻居已经被锁定,这将导致死锁。我试图找出一种方法,使任何/所有发生的删除都可以遍历列表的锁定部分,并确定当前正在删除过程中的最大子列表,然后锁定与该子列表相邻的节点,以便整个子列表被整体删除,但我的头开始受伤了.. :-P
结论(?):为了跟进,我确实有一些代码想要开始工作,但我'我对理论问题也感兴趣。每个人的答案都非常有帮助,并结合了我在这里表达的约束之外的详细信息(您真的不想知道指向要删除的元素的指针来自哪里以及其中涉及的同步!)我决定暂时放弃本地锁代码并专注于:
- 使用大量较小的列表,每个列表都有单独的锁。
- 最大限度地减少持有锁的指令数量,并在获取锁之前(以安全的方式)访问内存,以减少持有锁时发生页面错误和缓存未命中的可能性。
- 测量人为高负载下的争用并评估该方法是否令人满意。
再次感谢所有给出答案的人。如果我的实验进展不顺利,我可能会回到概述的方法(尤其是弗拉德的)并重试。
I'm trying to implement a (special kind of) doubly-linked list in C, in a pthreads environment but using only C-wrapped synchronization instructions like atomic CAS, etc. rather than pthread primitives. (The elements of the list are fixed-size chunks of memory and almost surely cannot fit pthread_mutex_t
etc. inside them.) I don't actually need full arbitrary doubly-linked list methods, only:
- insertion at the end of the list
- deletion from the beginning of the list
- deletion at arbitrary points in the list based on a pointer to the member to be removed, which was obtained from a source other than by traversing the list.
So perhaps a better way to describe this data structure would be a queue/fifo with the possibility of removing items mid-queue.
Is there a standard approach to synchronizing this? I'm getting stuck on possible deadlock issues, some of which are probably inherent to the algorithms involved and others of which might stem from the fact that I'm trying to work in a confined space with other constraints on what I can do.
Edit: In particular, I'm stuck on what to do if adjacent objects are to be removed simultaneously. Presumably when removing an object, you need to obtain locks on both the previous and next objects in the list and update their next/prev pointers to point to one another. But if either neighbor is already locked, this would result in a deadlock. I've tried to work out a way that any/all of the removals taking place could walk the locked part of the list and determine the maximal sublist that's currently in the process of removal, then lock the nodes adjacent to that sublist so that the whole sublist gets removed as a whole, but my head is starting to hurt.. :-P
Conclusion(?): To follow up, I do have some code I want to get working, but I'm also interested in the theoretical problem. Everyone's answers have been quite helpful, and combined with details of the constraints outside what I expressed here (you really don't want to know where the pointer-to-element-to-be-removed came from and the synchronization involved there!) I've decided to abandon the local-lock code for now and focus on:
- using a larger number of smaller lists which each have individual locks.
- minimizing the number of instructions over which locks are held and poking at memory (in a safe way) prior to acquiring a lock to reduce the possibility of page faults and cache misses while a lock is held.
- measuring the contention under artificially-high load and evaluating whether this approach is satisfactory.
Thanks again to everybody who gave answers. If my experiment doesn't go well I might come back to the approaches outlined (especially Vlad's) and try again.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
为什么不直接应用粗粒度锁呢?只需锁定整个队列即可。
更复杂的(但不一定更有效,取决于您的使用模式)解决方案是使用读写锁,分别用于读取和写入。
在我看来,使用无锁操作对于您的情况来说并不是一个好主意。想象一下某个线程正在遍历您的队列,同时“当前”项目被删除。无论您的遍历算法拥有多少额外链接,所有这些项目都可能被删除,因此您的代码将没有机会完成遍历。
比较和交换的另一个问题是,使用指针,您永远不知道它是否真的指向同一个旧结构,或者旧结构已被释放,并且在同一地址分配了一些新结构。对于您的算法来说,这可能是也可能不是问题。
对于“本地”锁定的情况(即,可以单独锁定每个列表项),一个想法是使锁有序。对锁进行排序可确保不会出现死锁。所以你的操作是这样的:
通过指针 p 删除到上一个项目:
插入开头:
这似乎是正确的,但我没有尝试这个想法。
本质上,这使得双链表像单链表一样工作。
如果您没有指向前一个列表元素的指针(当然通常是这种情况,因为实际上不可能使这样的指针保持一致状态),您可以执行以下操作:
通过指向该项目的指针 c 删除待删除:
请注意,仅拥有指向某个列表项的指针并不能确保该项不会被释放,因此您需要有一个一种重新计数,这样该项目就不会在您尝试锁定它的那一刻被销毁。
请注意,在最后一个算法中,重试次数是有限的。确实,新项不能出现在c的左边(插入是在最右边的位置)。如果我们的步骤 5 失败,因此我们需要重试,这只能是由于同时从列表中删除 p 造成的。这样的删除最多可以发生 N-1 次,其中 N 是 c 在列表中的初始位置。当然,这种最坏的情况不太可能发生。
Why not just apply a coarse-grained lock? Just lock the whole queue.
A more elaborate (however not necessarily more efficient, depends on your usage pattern) solution would be using a read-wrote lock, for reading and writing, respectively.
Using lock-free operations seem to me not a very good idea for your case. Imagine that some thread is traversing your queue, and at the same moment the "current" item is deleted. Doesn't matter how many additional links your traverse algorithm holds, all that items may be deleted, so your code would have no chance to finish the traversal.
Another issue with compare-and-swap is that with pointers you never know whether it really points to the same old structure, or the old structure has been freed and some new structure is allocated at the same address. This may or may not be an issue for your algorithms.
For the case of "local" locking (i.e., the possibility to lock each list item separately), An idea would be to make the locks ordered. Ordering the locks ensures the impossibility of a deadlock. So your operations are like that:
Delete by the pointer p to the previous item:
Insert into the beginning:
This seems to be correct, I didn't however try this idea.
Essentially, this makes the double-linked list work as if it were a single-linked list.
If you don't have the pointer to the previous list element (which is of course usually the case, as it's virtually impossible to keep such a pointer in consistent state), you can do the following:
Delete by the pointer c to the item to be deleted:
Note that just having a pointer to some list item cannot ensure that the item is not deallocated, so you'll need to have a kind of refcounting, so that the item is not destroyed at the very moment you try to lock it.
Note that in the last algorithm the number of retries is bounded. Indeed, new items cannot appear on the left of c (insertion is at the rightmost position). If our step 5 fails and thus we need a retry, this can be caused only by having p removed from the list in the meanwhile. Such a removal can occur not more than N-1 times, where N is the initial position of c in the list. Of course, this worst case is rather unlikely to happen.
请不要严厉地对待这个答案,但也不要这样做。
你几乎肯定会遇到错误,而且是很难发现的错误。使用 pthreads 锁原语。它们是您的朋友,并且是由深刻了解您选择的处理器提供的内存模型的人编写的。如果你尝试用 CAS 和原子增量等做同样的事情,你几乎肯定会犯一些微妙的错误,直到为时已晚。
这里有一个小代码示例来帮助说明这一点。这把锁有什么问题吗?
答案是:释放锁时不存在内存屏障,这意味着在下一个线程进入锁之前,锁内执行的某些写操作可能还没有发生。哎呀! (可能还有更多错误,例如,该函数在自旋循环内没有执行适合平台的产量,因此极大地浪费了 CPU 周期。&c.)
如果您将双链表实现为一个带有哨兵节点的循环列表,那么您只需要执行两次指针赋值即可从列表中删除一个项目,四次即可添加一个项目。我确信您有能力对这些指针分配持有一个编写良好的排他锁。
请注意,我假设您不是少数几个深刻理解内存模型的人之一,只是因为世界上的内存模型很少。如果你是这些人中的一员,那么即使你也无法弄清楚这一事实应该表明它是多么棘手。 :)
我还假设您问这个问题是因为您有一些实际上想要工作的代码。如果这只是为了了解更多有关线程的知识而进行的学术练习(也许是成为深入的低级并发专家的一步),那么无论如何,请忽略我,并研究内存的细节您所定位的平台的型号。 :)
Please don't take this answer harshly, but don't do this.
You will almost certainly wind up with bugs, and very hard bugs to find at that. Use the pthreads lock primitives. They are your friends, and have been written by people who deeply understand the memory model provided by your processor of choice. If you try to do the same thing with CAS and atomic increment and the like, you will almost certainly make some subtle mistake that you won't find until it's far too late.
Here's a little code example to help illustrate the point. What's wrong with this lock?
The answer is: there's no memory barrier when releasing the lock, meaning that some of the write operations executed within the lock may not have happened before the next thread gets into the lock. Yikes! (There are probably many more bugs too, for example, the function doesn't do the platform-appropriate yield inside the spin loop and so is hugely wasteful of CPU cycles. &c.)
If you implement your double-linked list as a circular list with a sentinal node, then you only need to perform two pointer assignments in order to remove an item from the list, and four to add an item. I'm sure you can afford to hold a well-written exclusive lock over those pointer assignments.
Note that I am assuming that you are not one of the few people who deeply understand memory models only because there are very few of them in the world. If you are one of these people, the fact that even you can't figure it out ought to be an indication of how tricky it is. :)
I am also assuming that you're asking this question because you have some code you'd actually like to get working. If this is simply an academic exercise in order to learn more about threading (perhaps as a step on your way to becoming a deep low-level concurrency expert) then by all means, ignore me, and do your research on the details of the memory model of the platform you're targeting. :)
如果您维护严格的锁层次结构,则可以避免死锁:如果您要锁定多个节点,请始终首先锁定靠近列表头部的节点。因此,要删除一个元素,首先锁定该节点的前驱节点,然后锁定该节点,然后锁定该节点的后继节点,取消该节点的链接,然后以相反的顺序释放锁定。
这样,如果多个线程尝试同时删除相邻节点(例如链 ABCD 中的节点 B 和 C),那么第一个获得节点 B 锁的线程将是第一个取消链接的线程。线程 1 将锁定 A,然后锁定 B,然后锁定 C,线程 2 将锁定 B,然后锁定 C,然后锁定 D。只有 B 的竞争,并且线程 1 在等待线程持有的锁定时无法持有锁定如图2所示,当线程2正在等待线程1持有的锁时(即死锁)。
You can avoid deadlock if you maintain a strict hierarchy of locks: if you're locking multiple nodes, always lock the ones closer to the head of the list first. So, to delete an element, first lock the node's predecessor, then lock the node, then lock the node's successor, unlink the node, and then release the locks in reverse order.
This way, if multiple threads try to delete adjacent nodes simultaneously (say, nodes B and C in the chain A-B-C-D), then whichever thread first gets the lock to node B will be the one that will unlink first. Thread 1 will lock A, then B, then C, and thread 2 will lock B, then C, then D. There's only competition for B, and there's no way that thread 1 can hold a lock while waiting for a lock held by thread 2 and while thread 2 is waiting on the lock held by thread 1 (i.e. deadlock).
如果没有锁定整个列表,您就无法逃脱。原因如下:
插入空列表
线程 A 和 B 想要插入一个对象。
线程 A 检查列表,发现它是空的
。发生上下文切换。
线程 B 检查列表,发现它为空,并更新头部和尾部以指向其对象。
发生上下文切换,
线程 A 更新头部和尾部以指向其对象。线程 B 的对象已丢失。
从列表中间删除一个项目
线程 A 想要删除节点 X。为此,它首先必须锁定 X 的前驱节点、X 本身以及 X 的后继节点,因为所有这些节点都会受到操作的影响。要锁定 X 的前身,您必须执行类似的操作
虽然我使用了函数调用语法,但如果
spin_lock
是一个函数,那么您就已经死在水中了,因为在实际获得锁之前,这至少涉及三个操作:有两个地方可以交换线程A,另一个线程可以进入并删除X的前驱,而无需线程A 知道X 的前任已经改变。所以你必须以原子方式实现自旋锁本身。即,您必须向 X 添加一个偏移量以获得 x->prev,然后取消引用它以获得 *(x->prev) 并向其添加一个偏移量以获得 lockFlag,然后在一个原子单元中执行原子操作。否则,在您承诺锁定特定节点之后但在实际锁定它之前,总是有机会潜入某些东西。
You cannot get away without a lock for the whole list. Here's why:
Insert into an Empty List
Threads A and B wants to insert an object.
Thread A examines the list, finds it empty
A context switch occurs.
Thread B examines the list, finds it empty and updates the head and tail to point to its object.
A context switch occurs
Thread A updates the head and tail to point to its object. Thread B's object has been lost.
Delete an item from the middle of the list
Thread A wants to delete node X. For this it first has to lock X's predecessor, X itself and X's successor since all of these nodes will be affected by the operation. To lock X's predecessor you must do something like
Although I've used function call syntax, if
spin_lock
is a function, you are dead in the water because that involves at least three operations before you actually have the lock:There are two places there where thread A can be swapped out and another thread can get in and remove X's predecessor without thread A knowing that X's predecessor has changed. So you have to implement the spin lock itself atomically. i.e. you have to add an offset to X to get x->prev then dereference it to get *(x->prev) and add an offset to that to get lockFlag and then do an atomic operation all in one atomic unit. Otherwise there is always an opportunity for something to sneak in after you have committed to locking a particular node but before you have actually locked it.
我注意到,这里需要双向链表的唯一原因是需要从列表中间删除节点,这些节点是在不遍历列表的情况下获得的。显然,简单的 FIFO 可以用单链表(具有头指针和尾指针)来实现。
您可以通过引入另一层间接来避免从中间删除的情况 - 如果列表节点仅包含一个
next
指针和一个payload
指针,并且实际的数据指向其他地方(你说在插入点不可能分配内存,所以你只需要在分配有效负载本身的同一点分配列表节点结构)。在从中间删除的情况下,您只需将
payload
指针设置为NULL
并将孤立节点保留在列表中。如果 FIFO 弹出操作遇到这样的空节点,它只是释放它并重试。这种延迟允许您使用单链表,并且无锁单链表实现更容易正确实现。当然,围绕删除队列中间的节点仍然存在一个重要的竞争 - 似乎没有什么可以阻止该节点到达队列的前面并在决定它想要的线程之前被另一个线程删除删除它实际上有机会这样做。这场比赛似乎超出了您问题中提供的详细信息的范围。
I note that the only reason you need a doubly-linked list here is because of the requirement to delete nodes from the middle of the list, that were obtained without walking the list. A simple FIFO can obviously be implemented with a singly-linked list (with both head and tail pointers).
You could avoid the deletion-from-the-middle case by introducing another layer of indirection - if the list nodes simply contain a
next
pointer and apayload
pointer, with the actual data pointed to elsewhere (you say memory allocation is not possible at the point of insertion, so you'll just need to allocate the list node structure at the same point that you allocate the payload itself).In the delete-from-the-middle case, you simply set the
payload
pointer toNULL
and leave the orphaned node in the list. If the FIFO pop operation encounters such an empty node, it just frees it and tries again. This deferral lets you use a singly-linked list, and a lockless singly-linked list implementation is significantly easier to get right.Of course, there is still an essential race here around the removal of a node in the middle of the queue - nothing appears to stop that node coming to the front of the queue and being removed by another thread before the thread that has decided it wants to remove it actually gets a chance to do so. This race appears to be outside the scope of the details provided in your question.
两个想法。
首先,为了避免死锁问题,我会做某种自旋锁:
并循环
和循环
由于从列表中拼接一个元素作为一个操作并不是很长,所以这不会花费太多的性能开销。如果您确实急于同时删除所有元素,它仍然应该为您提供一些良好的并行性。
第二种是进行延迟删除。标记要删除的元素,只有当它们出现在列表末尾时才有效地删除它们。由于您只对头部和尾部感兴趣,列表项的有效用户可以执行此操作。好处是当它们在末尾被删除时,死锁问题就消失了。缺点是这使得最终的删除成为一个顺序操作。
Two ideas.
First, to avoid the deadlock problem I would do some sort of spinlock:
and loop
and loop
Since splicing an element out of a list is not very lengthy as an operation, this shouldn't cost you much performance overhead. And in case that you really have a rush to delete all elements at the same time, it still should give you some good parallelism.
The second would be to do lazy delete. Mark your elements that are to be deleted and only remove them effectively when they appear at the end of the list. Since you are only interested in the head and the tail the effective users of the list items can do this. The advantage is that when they are at the end when deleted, the deadlock problem disappears. The disadvantage is that this makes the final deletion a sequential operation.