寻找用 C 实现的快速排序整数数组交集/并集算法
我正在寻找实现快速排序整数数组交集/并集操作的 C 算法(或代码)。越快越好。
换句话说,在 C 中实现两个整数数组之间的并集和交集运算的有效方法是什么?
I am looking for C algorithms (or code) that implement fast sorted integer array intersection/union operations. The faster, the better.
In other words, what's an efficient way in C to implement union and intersection operations between two arrays of integers?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这并不是最快的,但它展示了正确的算法。
上面的算法是 O(aLen + bLen)
你可以做得更好,特别是当涉及到 2 个以上列表相交的问题时。
对于交集,基本算法是迭代所有要同时相交的排序列表。如果所有列表的头都匹配,则移动到所有列表中的下一个元素并将头添加到交集。如果没有,则查找最大可见元素,并尝试在所有其他列表中查找该元素。
在我的代码示例中,我只是继续迭代,但由于这些是排序列表,如果您希望 A 是数字 1 到 10000,B 是集合 {7777},您也可以通过二分搜索找到正确的索引。如果你想正确地找到具有多个列表的最大元素意味着使用堆。
如果您更改二分搜索,最坏的情况将上升到 O((aLen + bLen) * (lg(aLen + bLen)),但根据数据,您的平均情况可能会大大改善。
堆更改将是必要的当将多个集合相交时,由于上述算法变为 O(numLists * (所有列表中的元素总数)) 并且可以减少到 O(lg (numLists) * (所有列表中的元素总数)) 并
集基本上是相同算法,除非你总是添加每个元素,因此,在二分搜索中没有任何意义可以使用实现 peek 的堆来将其扩展到多个列表,基本规则是始终添加最小的元素,然后向前推进。每个列表的前面都有该元素。
This is not quite the fastest, but it demonstrates the right algorithm.
Above algorithm is O(aLen + bLen)
You can do better, especially when it comes to the problem of intersecting more than 2 lists.
For intersection the basic algorithm is to iterate through all sorted lists to be intersected at the same time. If the heads of all the lists match, move to the next element in all lists and add the head to the intersection. If not, find the maximum element visible, and attempt to find that element in all other lists.
In my code example, I just keep iterating, but since these are sorted lists, if you expect A to be the numbers 1 through 10000 and B to be the set {7777}, you can also binary search to the correct index. Finding the maximum element with multiple lists means using a heap if you want to do it properly.
If you make the binary search change, your worst case will go up to O((aLen + bLen) * (lg(aLen + bLen)), but depending on data, your average case might drastically improve.
The heap change will be necessary when intersecting many sets together, as the above algorithm becomes O(numLists * (total number of elements in all lists)) and can be reduced to O(lg (numLists) * (total number of elements in all lists))
Union is basically the same algorithm, except you always add every element, and as such, theres no point whatsoever in binary searching. Extending it to multiple lists can be done with a heap that implements peek, basic rule is to always add the smallest element, and step forward in every list that had that element at the front.
假设这些是实际的集合(因为数组与重复项的交集充其量是有问题的),以下代码可能会有所帮助。
首先,一些必需的标头:
以及一些强制性文档:
然后是正确的函数:
到目前为止,主要是函数的初始化。接下来的位遍历两个输入集,选择添加到结果中的内容。 (在我看来)最好分三个阶段来完成,第一个阶段是当两个输入集仍有剩余时选择一个元素。请注意此处对于并集和交集的不同行为,特别是交集仅在元素位于两个输入集中时才添加元素:
其他两个阶段(其中只有一个阶段将针对并集运行,对于交集则不会运行) ,只需从非空输入集中获取剩余项目:
和一个测试程序:
及其输出,如预期的那样:
Assuming that these are actual sets (since an intersection on arrays with duplicates is problematic at best), the following code may help.
First, some requisite headers:
and some mandatory documentation:
Then the function proper:
Up until there, it's mostly initialisation for the function. This following bit traverses the two input sets, selecting what gets added to the result. This is best done (in my opinion) as three phases, the first being selection of an element when both input sets still have some remaining. Note the differing behaviour here for unions and intersections, specifically intersections only have the element added if it's in both input sets:
The other two phases (of which only one will run for unions, and none for intersections), simply gets the remaining items from the non-empty input set:
And a test program:
along with its output, as expected: