高效的列表交集算法
给定两个列表(不一定是排序的),找到这些列表的集合交集的最有效的非递归算法是什么?
我不相信我有权使用哈希算法。
Given two lists (not necessarily sorted), what is the most efficient non-recursive algorithm to find the set intersection of those lists?
I don't believe I have access to hashing algorithms.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(15)
您可以将第一个列表的所有元素放入哈希集中。 然后,迭代第二个列表,并针对其每个元素检查哈希以查看它是否存在于第一个列表中。 如果是,则将其作为交集的元素输出。
You could put all elements of the first list into a hash set. Then, iterate the second one and, for each of its elements, check the hash to see if it exists in the first list. If so, output it as an element of the intersection.
您可能想看看布隆过滤器。 它们是位向量,给出元素是否是集合成员的概率答案。 集合交集可以通过简单的按位与运算来实现。 如果您有大量空交集,布隆过滤器可以帮助您快速消除这些空交集。 但是,您仍然需要求助于此处提到的其他算法之一来计算实际的交集。
http://en.wikipedia.org/wiki/Bloom_filter
You might want to take a look at Bloom filters. They are bit vectors that give a probabilistic answer whether an element is a member of a set. Set intersection can be implemented with a simple bitwise AND operation. If you have a large number of null intersections, the Bloom filter can help you eliminate those quickly. You'll still have to resort to one of the other algorithms mentioned here to compute the actual intersection, however.
http://en.wikipedia.org/wiki/Bloom_filter
如果没有散列,我想您有两个选择:
without hashing, I suppose you have two options:
从 eviews 功能列表来看,它似乎支持复杂的合并和连接(如果这是' join'(如数据库术语中所示),它将计算交集)。 现在深入研究您的文档:-)
此外,eviews 有自己的用户论坛 - 为什么不在那里询问_
From the eviews features list it seems that it supports complex merges and joins (if this is 'join' as in DB terminology, it will compute an intersection). Now dig through your documentation :-)
Additionally, eviews has their own user forum - why not ask there_
使用集合 1 构建一个具有
O(log n)
的二叉搜索树,并迭代 set2 并搜索BST m XO(log n)
所以总共O(log n ) + O(m)+O(log n) ==> O(log n)(m+1)
with set 1 build a binary search tree with
O(log n)
and iterate set2 and search theBST m X O(log n)
so totalO(log n) + O(m)+O(log n) ==> O(log n)(m+1)
在 C++ 中,可以使用 STL 映射尝试以下操作
in C++ the following can be tried using STL map
这是我提出的另一种可能的解决方案,时间复杂度为 O(nlogn),并且不需要任何额外的存储。 您可以在这里查看 https://gist.github.com/4455373
以下是它的工作原理:假设集合不包含任何重复,将所有集合合并为一个集合并排序。 然后循环遍历合并的集合,并在每次迭代时在当前索引 i 和 i+n 之间创建一个子集,其中 n 是宇宙中可用集合的数量。 我们在循环时寻找的是一个大小为 n 的重复序列,该序列等于宇宙中的集合数。
如果 i 处的子集等于 n 处的子集,则意味着 i 处的元素重复 n 次,这等于集合的总数。 由于任何集合中都没有重复,这意味着每个集合都包含该值,因此我们将其添加到交集。 然后我们将索引移动 i + 它和 n 之间剩余的内容,因为这些索引肯定不会形成重复序列。
Here is another possible solution I came up with takes O(nlogn) in time complexity and without any extra storage. You can check it out here https://gist.github.com/4455373
Here is how it works: Assuming that the sets do not contain any repetition, merge all the sets into one and sort it. Then loop through the merged set and on each iteration create a subset between the current index i and i+n where n is the number of sets available in the universe. What we look for as we loop is a repeating sequence of size n equal to the number of sets in the universe.
If that subset at i is equal to that subset at n this means that the element at i is repeated n times which is equal to the total number of sets. And since there are no repetitions in any set that means each of the sets contain that value so we add it to the intersection. Then we shift the index by i + whats remaining between it and n because definitely none of those indexes are going to form a repeating sequence.
首先,使用快速排序对两个列表进行排序:O(n*log(n)。然后,通过首先浏览最低值来比较列表,然后添加公共值。例如,在 lua) 中:
即
O(max (n, m))
其中n
和m
是列表的大小。编辑:快速排序是递归的,如评论中所述,但看起来有 非递归 实现
First, sort both lists using quicksort : O(n*log(n). Then, compare the lists by browsing the lowest values first, and add the common values. For example, in lua) :
which is
O(max(n, m))
wheren
andm
are the sizes of the lists.EDIT: quicksort is recursive, as said in the comments, but it looks like there are non-recursive implementations
使用 跳过指针 和 SSE 说明 可以提高列表交叉效率。
Using skip pointers and SSE instructions can improve list intersection efficiency.
为什么不实现自己的简单哈希表或哈希集? 如果您的列表如您所说的很大,那么避免 nlogn 交叉是值得的。
由于您事先对数据有所了解,因此您应该能够选择一个好的哈希函数。
Why not implement your own simple hash table or hash set? It's worth it to avoid nlogn intersection if your lists are large as you say.
Since you know a bit about your data beforehand, you should be able to choose a good hash function.
我赞同“集合”的想法。 在 JavaScript 中,您可以使用第一个列表来填充对象,并使用列表元素作为名称。 然后,您使用第二个列表中的列表元素并查看这些属性是否存在。
I second the "sets" idea. In JavaScript, you could use the first list to populate an object, using the list elements as names. Then you use the list elements from the second list and see if those properties exist.
如果支持 集(正如您在标题中所称的那样)作为内置,通常有一个交集方法。
无论如何,正如有人所说,如果您对列表进行排序,您可以轻松做到这一点(我不会发布代码,有人已经这样做了)。 如果不能使用递归也没有问题。 有无递归快速排序实现。
If there is a support for sets (as you call them in the title) as built-in usually there is a intersection method.
Anyway, as someone said you could do it easily (I will not post code, someone already did so) if you have the lists sorted. If you can't use recursion there is no problem. There are quick sort recursion-less implementations.
在 PHP 中,类似
In PHP, something like
根据 Big-Oh 表示法的定义:
这实际上意味着,如果两个列表的大小相对较小,则每两个 for 循环中少于 100 个元素就可以了。 循环第一个列表并在第二个列表中查找相似的对象。
就我而言,它工作得很好,因为我的列表中最多不会有超过 10 - 20 个元素。
然而,一个好的解决方案是对第一个 O(n log n) 进行排序,对第二个 O(n log n) 进行排序并合并它们,另一个 O(n log n) 大致来说是 O(3 n log n),可以这么说两个列表的大小相同。
From the definition of Big-Oh notation:
Which in practice means that if the two lists are relatively small in size say something less 100 elements in each two for loops works just fine. Loop the first list and look for similar object in the second.
In my case it works just fine because I won't have more than 10 - 20 max elements in my lists.
However, a good solution is the sort the first O(n log n), sort the second also O(n log n) and merge them, another O(n log n) roughly speeking O(3 n log n), say that the two lists are the same size.
时间:O(n) 空间:O(1) 用于识别交点的解决方案。
例如,两个给定节点将检测每次到达终点时交换指针来确定交点。 此处有视频说明。
谢谢。
编辑
我对交点的解释是找到交点。
例如:
对于给定的列表 A 和 B,A 和 B 将在点
c1
处“相遇/相交”,上面的算法将返回c1
。 正如 OP 所说,OP 无法访问Hashmaps
或某种类型,我相信 OP 是说该算法应该具有O(1)
空间复杂度。我前段时间从 Leetcode 得到了这个想法,如果有兴趣的话: 两个链接的交集列表。
Time: O(n) Space: O(1) Solution for identifying points of intersection.
For example, the two given nodes will detect the point of intersection by swapping pointers every time they reach the end. Video Explanation Here.
Thanks.
Edit
My interpretation of intersection is finding the point of intersection.
For example:
For the given lists A and B, A and B will "meet/intersect" at point
c1
, and the algo above will returnc1
. As OP stated that OP has NO access toHashmaps
or some sort, I believe OP is saying that the algo should haveO(1)
space complexity.I got this idea from Leetcode some time ago, if interested: Intersection of Two Linked Lists.