普通的联合/查找算法在不需要任何额外工作的情况下线程安全吗?
标准的 union/find 或 Disjoint-set 数据结构具有非常好的运行时间(有效地对于单线程情况,O(1)
)。 然而,它在多线程情况下的有效性/性能如何? 我认为即使没有锁定或除了原子指针大小的写入之外的任何原子操作,它也是完全有效的。
有谁发现以下逻辑有问题吗?
首先,我假设指针大小的写入是原子的。 由此看来,不难看出您可以在多个线程中安全地运行 find 函数,因为唯一会发生的更新都是设置为相同的值。 如果您允许 find 函数返回调用时(而不是返回时)正确的答案,那么不难认为许多 find 和 a单个
union
可以同时运行;find
的参数不会改变,union
仅更新根,而find
永远不会更新根。至于剩下的情况(几个
union
),我认为这也有效,但我不太确定。
顺便说一句:我不要求解决方案与单线程版本一样高效。 (为了避免锁/原子,我也愿意放弃全局一致的状态。)
编辑:再看一下,多联合的情况不起作用,因为如果一侧不是新的根与其他东西联合(也不是作为根),然后你可以将它从第二个联合的另一侧切掉。
A = find(a) // union 1
B = find(b) // union 1
----
X = find(x) // union 2 (x == a) -> (X == A) -> (X.par aliases A.par)
Y = find(y) // union 2
X.par = Y // union 2
----
A.par = B // union 1
这可以通过 CAS 来回避:
while(!CAS(A == A.par, B)) A = find(A);
The standard union/find or Disjoint-set data structure has very good running times (effectively O(1)
) for single threaded cases. However what is it's validity/performance under multi-threaded cases? I think that it is totally valid even without locking or any atomic operations aside from atomic pointer sized writes.
Does anyone see any problems with the following logic?
First I will assume that pointer sized writes are atomic. From that, it's not hard to argue that you can safely run the
find
function in several threads as the only updates that will occur are all sets to the same value. If you allow thefind
function to return the answer that was true when it was called (as opposed to when it returned) it's not hard to argue that manyfind
s and a singleunion
can be run at the same time; the argument for thefind
s doesn't change and theunion
only updates roots and thefind
s never update roots.As for the remaining case (several
union
s), I think that works as well but I'm not so sure of it.
BTW: I'm not requiring that the solution be just as efficient as the single threaded version. (In order to avoid locks/atomic, I'm also willing to discard globally coherent state.)
edit: taking another look, the many-union cases doesn't work because if the side that is not the new root is unioned with something else (also not as a root) then you could cut it off of other side of the second union.
A = find(a) // union 1
B = find(b) // union 1
----
X = find(x) // union 2 (x == a) -> (X == A) -> (X.par aliases A.par)
Y = find(y) // union 2
X.par = Y // union 2
----
A.par = B // union 1
this can be sidestepped with a CAS:
while(!CAS(A == A.par, B)) A = find(A);
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
有点晚了,但供以后参考。 通过原子操作,您可以构建不相交的集合数据结构。 不变的是每个集合都由其最小成员表示,这可以避免由于竞争条件而导致的循环。
A little late but for future reference. With atomic operations, you can build a disjoint set data structure. The invariant is that each set is represented by its smallest member, which allows avoiding loops due to race conditions.
可用于此结构的同步类型与读写器问题。 性能将取决于将执行多少个联合,因为这样查找操作将停止。
我不确定您是否可以同时运行多个查找和一个联合,因为联合操作的最后一个案例不是原子的。
The type of synchronization which you can use for this structure is similar with the Readers-writers problem. The performance will depend on how many unions will be executed because then the find operation will be stopped.
I'm not sure that you can run many finds and a single union in the same time because the last case from an union operation is not atomic.