找到两个集合的最大公共子集的有效算法?
每组都包含一堆校验和。例如:
A组:
{
4445968d0e100ad08323df8c895cea15
a67f8052594d6ba3f75502c0b91b868f
07736dde2f8484a4a3af463e05f039e3
5b1e374ff2ba949ab49870ca24d3163a
}
B组:
{
6639e1da308fd7b04b7635a17450df7c
4445968d0e100ad08323df8c895cea15
a67f8052594d6ba3f75502c0b91b868f
A 和 B 的最大公共子集是: {
4445968d0e100ad08323df8c895cea15
a67f8052594d6ba3f75502c0b91b868f
将
执行许多这样的操作,因此我正在寻找一种有效的算法来执行此操作。 感谢您的帮助。
Each set contains bunch of checksums. For example:
Set A:
{
4445968d0e100ad08323df8c895cea15
a67f8052594d6ba3f75502c0b91b868f
07736dde2f8484a4a3af463e05f039e3
5b1e374ff2ba949ab49870ca24d3163a
}
Set B:
{
6639e1da308fd7b04b7635a17450df7c
4445968d0e100ad08323df8c895cea15
a67f8052594d6ba3f75502c0b91b868f
}
The maximum common subset of A and B is:
{
4445968d0e100ad08323df8c895cea15
a67f8052594d6ba3f75502c0b91b868f
}
A lot of these operations will be performed, so I'm looking for an efficient algorithm to do so.
Thanks for your help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
将其中一个集合放入哈希表中,然后迭代另一个集合,丢弃不在哈希表中的元素。或者,对两者进行排序并同时迭代它们,就像合并排序一样。
编辑:后一种方法创建排序结果。我应该补充一点,如果这些集合的大小相差很大并且它们是预先排序的(比如说因为您正在做一堆交叉点),那么您可以通过使用“无界”二分搜索在大名单。
Put one of the sets in a hash table and iterate through the other, discarding elements that aren't in the hash. Alternatively, sort both and iterate through them simultaneously, as in merge sort.
EDIT: The latter method creates a sorted result. I should add that if the sets are of widely disparate sizes and they're presorted (say because you're doing a bunch of intersections), then you can realize a large performance improvement by using "unbounded" binary search to skip ahead in the large list.
将它们放入哈希表中并记下确切的冲突。
Stick them in a hashtable and note the exact collisions.
Set C是你的公共子集。
Set C is your common subset.
当底层集合结构有序时 - 常见情况是一种树(BST、AVL 等) - 那么您只需执行最后一步。
为了清楚地说明最后一步,这是它的伪代码:
When underlying set structure is ordered - common case is a kind of Tree (BST,AVL etc.), - then you need only last step to perform.
To make last step clear, here is it's pseudocode: