非唯一的 C++未排序交集算法
我一直在尝试想出一种方法来编写有效的算法来对两个向量/数组执行未排序的交集,但没有运气。我正在使用一个大型非唯一数组(通常有 500,000 到 1,000,000 个值)和一个相对较小(最多可能有 5000 个值)的唯一数组。
我已经看到这里建议的各种方法涉及 unordered_sets 等技术,但据我了解,如果其中一个数组不唯一,则这不起作用。其次,我希望输出向量包含这些公共值相对于较大数组的索引,而不是包含两个数组共有的数字的输出向量。因此,如果较大数组有 5 个位置等于较小数组中的值之一,则我需要这 5 个索引中的每一个。也许类似于 python 的 in1d 函数。
有人有什么想法吗?谢谢
I have been trying to come up with a way to write an efficient algorithm to perform an unsorted intersection on two vectors/arrays, but with no luck. I am working with one large non-unique array (generally 500,000 to 1,000,000 values), and one relatively smaller (maybe 5000 values max) unique array.
I have seen a variety of methods suggested on here involving techniques such as unordered_sets, but to my understanding, this doesn't work if one of the arrays is non-unique. Secondly, instead of having an output vector that contains the numbers common to both arrays, I'd like to have the output vector contain the indices of those common values with respect to the larger array. So, if the larger array has 5 locations that equal one of the values in the smaller array, I need each of those 5 indices. Perhaps something similar to python's in1d function.
Anyone have any ideas? Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
将唯一的边放入
unordered_set
中,并逐一遍历非唯一的边。如果您在unordered_set(unique_side)
中的non_unique_side[i]
处找到某个项目,请将i
添加到结果中。假设将
unordered_set
实现为具有O(1)
摊销插入和查找时间的哈希集,则此算法将获得O(L+S)
时间复杂度,其中L
是较大列表中的项目数,S
是较小集合中的项目数。这与您在交叉路口时所能达到的速度一样快。Put the unique side into an
unordered_set
, and go through the non-unique side one by one. If you find an item atnon_unique_side[i]
in theunordered_set(unique_side)
, addi
to the result.Assuming that
unordered_set
is implemented as a hash set withO(1)
amortized insertion and lookup times, this algorithm gets youO(L+S)
time complexity, whereL
is the number of items in the larger list, andS
is the number of items in the smaller set. This is as fast as you can do an intersection.创建另一个包含大数组中所有索引的向量。然后使用使用一级间接的谓词对索引进行排序,并对唯一数组执行相同的操作或就地对其进行排序。然后使用允许一级间接的比较进行正常有序交集,并将映射向量中的索引放入最终结果中。
Create another vector that contains all the indexes from the big array. Then sort the indexes using a predicate that uses one level of indirection, and either do the same for the unique array or sort it in place. Then do a normal ordered intersection using a comparison that allows for one level of indirection and places the index from the mapping vector into the final result.
您可以将大数组从其值映射到 int。
例如:
unordered_map
当您映射较大的数组时,只需增加找到的每个项目的值
然后您只需遍历较小的值,对于每个值,检查是否它存在于地图中。如果存在,则将映射 int 中的项目数添加到结果向量中。
因此,如果有 5 个 6,则 map[6] = 5.. 因此只需将 6 的 5 个实例添加到结果值中即可。
编辑:
如果您想要索引,您可以映射到 int 向量,并为每个值保留您找到的索引向量。
You could map the large array from its value to int.
for example:
unordered_map<int,int>
When you map the larger array, just increase the value for each item you find
Then you just need to go over the smaller value, and for each value, check if it exists in the map. If it exists, then add the number of items in the mapped int to the result vector.
so if you have 5 sixes, the map[6] = 5.. so just add 5 instances of 6 to the result value.
Edit:
If you want indices, you can map to a vector of int, and keep for each value the vector of indices you've found.