O(1) 不相交集合数据结构中的创建、查找、并集
今天,我和某人讨论了 Kruskal 最小生成树算法,因为这张幻灯片的第 13 页。
演示的作者说,如果我们使用(双向)链表实现不相交集,Make 和 Find 的性能将是 O(1)分别为 和 O(1)。 Union(u,v)运算的时间为min(nu,nv),其中nu和nv是存储 u 和 v 的集合的大小。
我说过我们可以通过将 Union(u,v) 的时间改进为 O(1)每个成员的表示指针指向一个定位器,该定位器包含指向该集合的真实表示的指针。
在 Java 中,数据结构如下所示:
class DisjointSet {
LinkedList<Vertex> list = new LinkedList<Vertex>(); // for holding the members, we might need it for print
static Member makeSet(Vertex v) {
Member m = new Member();
DisjointSet set = new DisjointSet();
m.set = set;
set.list.add(m);
m.vertex = v;
Locator loc = new Locator();
loc.representation = m;
m.locator = loc;
return m;
}
}
class Member {
DisjointSet set;
Locator locator;
Vertex vertex;
Member find() {
return locator.representation;
}
void union(Member u, Member v) { // assume nv is less than nu
u.set.list.append(v.set.list); // hypothetical method, append a list in O(1)
v.set = u.set;
v.locator.representation = u.locator.representation;
}
}
class Locator {
Member representation;
}
抱歉,代码很简单。如果可以这样做,那么每个不相交集合操作(Make、Find、Union)的运行时间将是O(1)。但与我讨论过的人却看不到任何改善。我想知道你对此的看法。
Find/Union 在各种实现中最快的性能是多少?我不是数据结构方面的专家,但通过快速浏览互联网,我发现没有恒定时间的数据结构或算法可以做到这一点。
Today, I had discussion with someone about Kruskal Minimum Spanning Tree algorithm because of page 13 of this slide.
The author of the presentation said that if we implement disjoint sets using (doubly) linked list, the performance for Make and Find will be O(1) and O(1) respectively. The time for operation Union(u,v) is min(nu,nv), where nu and nv are the sizes of the sets storing u and v.
I said that we can improve the time for the Union(u,v) to be O(1) by making the representation pointer of each member pointing a locator that contains the pointer to the real representation of the set.
In Java, the data structure would look like this :
class DisjointSet {
LinkedList<Vertex> list = new LinkedList<Vertex>(); // for holding the members, we might need it for print
static Member makeSet(Vertex v) {
Member m = new Member();
DisjointSet set = new DisjointSet();
m.set = set;
set.list.add(m);
m.vertex = v;
Locator loc = new Locator();
loc.representation = m;
m.locator = loc;
return m;
}
}
class Member {
DisjointSet set;
Locator locator;
Vertex vertex;
Member find() {
return locator.representation;
}
void union(Member u, Member v) { // assume nv is less than nu
u.set.list.append(v.set.list); // hypothetical method, append a list in O(1)
v.set = u.set;
v.locator.representation = u.locator.representation;
}
}
class Locator {
Member representation;
}
Sorry for the minimalistic code. If it can be made this way, than running time for every disjoint set operation (Make,Find,Union) will be O(1). But the one whom I had discussion with can't see the improvement. I would like to know your opinion on this.
And also what is the fastest performance of Find/Union in various implementations? I'm not an expert in data structure, but by quick browsing on the internet I found out there is no constant time data structure or algorithm to do this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我的直觉与你的同事一致。你说:
u.set.list.append(v.set.list); // 假设方法,在 O(1) 中追加一个列表
看起来您的意图是通过追加完成联合。但是,要实现 Union,您必须删除重复项才能使结果成为一个集合。因此,例如,我可以看到针对固定集合大小的 O(1) 算法...
但这让我觉得这是作弊。如果您针对 N 的一般情况执行此操作,我不知道如何避免某种形式的迭代(或哈希查找)。这将使其成为 O(n)(或最多 O(log n))。
仅供参考:我很难遵循您的代码。在 makeSet 中,您构造一个永远不会转义该函数的本地定位器。看上去并没有什么作用。目前尚不清楚您在附录中的意图是什么。可能需要编辑并详细说明您的方法。
My intuition agrees with your colleague. You say:
u.set.list.append(v.set.list); // hypothetical method, append a list in O(1)
It looks like your intent is that the union is done via the append. But, to implement Union, you would have to remove duplicates for the result to be a set. So I can see an O(1) algorithm for a fixed set size, for example...
But that strikes me as cheating. If you're doing this for general cases of N, I don't see how you avoid some form of iterating (or hash lookup). And that would make it O(n) (or at best O(log n)).
FYI: I had a hard time following your code. In makeSet, you construct a local Locator that never escapes the function. It doesn't look like it does anything. And it's not clear what your intent is in the append. Might want to edit and elaborate on your approach.
使用 Tarjan 版本的 Union-Find 结构(具有路径压缩和排序功能)加权并集),m 查找和 (n-1) 混合并集的序列将在 O(m.α(m,n)) 中,其中 α(m ,n) 是 阿克曼函数 的反函数,对于 m< 的所有实际值/strong> 和 n 的值为 4。所以这基本上意味着 Union-Find 具有最坏情况摊销常数运算,但不完全是这样。
据我所知,虽然改进带来了更好的实践效率,但不可能获得更好的理论复杂性。
对于不相交集的特殊情况,例如语言理论中使用的不相交集,已经证明线性(即 O(1) 中的所有内容)适应是可能的——本质上是通过将节点分组在一起来实现的—— ——但这些改进不能转化为普遍问题。另一方面,类似的核心思想已被成功且巧妙地用于最小生成树的 O(n) 算法(Chazelle 算法)。
所以你的代码不可能是正确的。错误正是莫伦指出的:当您对两个集合进行并集时,您只更新每个列表的前导的“表示”,而不是所有其他元素的“表示”——同时假设在 find 函数中每个元素直接知道它的表示。
Using Tarjan's version of the Union-Find structure (with path compression and rank-weighed union), a sequence of m Finds and (n-1) intermixed Unions would be in O(m.α(m,n)), where α(m,n) is the inverse of Ackermann function which for all practical values of m and n has value 4. So this basically means that Union-Find has worst case amortized constant operations, but not quite.
To my knowledge, it is impossible to obtain a better theoretical complexity, though improvements have led to better practical efficiency.
For special cases of disjoint-sets such as those used in language theory, it has been shown that linear (i.e., everything in O(1)) adaptations are possible---essentially by grouping nodes together---but these improvements cannot be translated to the general problem. On the other hand of the spectrum, a somewhat similar core idea has been used with great success and ingenuity to make an O(n) algorithm for minimum spanning tree (Chazelle's algorithm).
So your code cannot be correct. The error is what Moron pointed out: when you make the union of two sets, you only update the "representation" of the lead of each list, but not of all other elements---while simultaneously assuming in the find function that every element directly knows its representation.