当前位置：文江博客话题详情

面试：列出内存有限的交集

发布于 2024-09-11 06:22:56 字数 101 浏览 2 评论 0原文

给定两组整数，大小为 M 和 N，其中 M < N. 对这两个集合执行内等连接（即找到两个列表的交集）。如果两个列表都在文件中并且可用内存的大小为 K < ，则如何执行它M＜氮

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

束缚ｍ 2024-09-18 06:22:56

首先使用就地排序算法对两个列表进行排序。

一旦 M 和 N 都排序完毕，计算交集就像一场竞赛。在每个列表的开头放置两个标记 a 和 b。执行这些步骤，直到任何标记到达列表末尾。用伪代码表示：

until a > M.size or b > N.size
    if M[a] < N[b]
        a++
        continue
    if N[b] < M[a]
        b++
        continue
    print M[a] // common element
    // Advance both indexes to avoid infinite loop
    a++
    b++

给定一个不错的就地排序算法，其复杂度将为O(nlogn)。

Using an in-place sorting algorithm to sort both the lists first.

Once both M an N are sorted, then calculating the intersection is like a race. Place two markers a and b at the beginning of each list. Do these steps until any marker reaches the end of the list. In pseudocode:

until a > M.size or b > N.size
    if M[a] < N[b]
        a++
        continue
    if N[b] < M[a]
        b++
        continue
    print M[a] // common element
    // Advance both indexes to avoid infinite loop
    a++
    b++

Given a decent in-place sorting algorithm, the complexity of this will be O(nlogn).

回复收藏 0 原文

深海夜未眠 2024-09-18 06:22:56

对于 M 的每个 k/2 子部分

 对于 N 的每个 k/2 子部分 
        对 M 子部分进行排序N子部分
        找到两个子部分的交集 
        搜索结果的每个元素，如果还没有则存储它

现在

for each k/2 subpart of M

 for each k/2 subpart of N 
        sort the M-subpart & N-subpart
        find intersection of both the subparts 
        search each element of the result and store it if it already not

present

回复收藏 0 原文

亚希 2024-09-18 06:22:56

将 N 分成 n 个部分 L (L < K)
一次读取 M 个数字，将该数字与当前部分中的所有数字进行比较 L
如果发现匹配，则将其写入文件

回复收藏 0 原文

不羁少年 2024-09-18 06:22:56

取 M 的 K/2 个元素和 N 的 K/2 个元素。
对 M 子集和 N 子集进行排序。
现在交叉点是微不足道的。
写入交集，删除交集的元素，写回左侧元素。
继续 M 和 N 的所有其他 K/2 子部分。
现在，您在第三个文件中拥有一些常见元素，以及一些部分排序的列表。
现在，对于 M 和 N 列表中的每个 K / 2（减去删除的元素），比较它们以有效地找到交集。

（您还可以添加额外的 2 M 子集和 N 子集合并轮次以加快交集速度）。

万岁！

执行示例：

Ok let's take 2 lists of 9 integers with values between 0 and 9.
These lists will be
4 2 5 1 9 2 9 0 2
and
4 7 4 0 8 6 2 6 5
(generated by rand())

So we take two sublists : 
4 2 5 and 4 7 4

Let's sort them using quicksort
4 2 5
i   j <- two pointers
pivot = [0] = 4

increase i until it's greater than pivot = 4
i and j now points to 5
decrease j until it's smaller than pivot = 4
j points to 2
i is greater than j, thus except for pivot, the list is sorted.
Swap [0] and [j] to finish sorting
2 4 5.

Except for pivot + i + j, no extra allocation was needed (For 3 elements it's hard to believe, see any quicksort applet to have a better understanding).
Thus, from an algorithmic point of view, pivot + i + j are constant and can be neglected (in fact you would do memory - algorithm size to have the real K).

Do the same for the second set
(4 4 7)

Now with two pointers to the lists initialized to the start of the sublists, compare the pointed values.
2 - 4
Increase first sublist's pointers
4 - 4
Equal -> a first common element, remove them from list [2 5] and [4 7]
5 - 4
Increase second pointer
5 - 7
Increase first pointer
End of this sublists.

Dump this to original lists.
Do this for any other sublists
[1 9 2] and [0 8 6] -> [1 2 9] [0 6 8] -> no intersection
[9 0 2] and [2 6 5] -> [0 9 E] [5 6 E] -> [2]

Now we have : 
[2 5 E 1 2 9 0 9 E] and [4 7 E 0 6 8 5 6 E] with E for empty elements.
Now compare subsets with other subsets (sorted) (excepted for already processed sets (same index))
[2 5 E] with [0 6 8] and [6 5 E] -> becomes [2 E E] and [0 6 8] [6 E E]  (intersection [5])
Then
[1 2 9] with [4 7 E] and [6 E E] -> [1 2 9] and [4 7 E] [6 E E] (no intersection)
Then
[0 9 E] with [4 7 E] and [0 6 8] -> [9 E E] and [4 7 E] [6 8 E] (intersection [0])
End : 
[2 E E 1 2 9 9 E E] [4 7 E 6 8 E 6 E E] with common elements [4 2 5 0]

take K/2 elements of M and K/2 elements of N.
Sort M-subset and N-subset.
Now the intersection is trivial.
Write the intersection, drop elements of the intersection, write back the left elements.
Continue with all other K/2 subparts of M and N.
You have now some common elements in a third file, and some partially sorted lists.
Now for each K / 2 (minus removed elements) of M and N lists, compare them to find intersection efficiently.

(You also can add extra rounds of merging of 2 M-subsets and N-subsets to speed up intersection).

Hurrah !

Example of execution:

Ok let's take 2 lists of 9 integers with values between 0 and 9.
These lists will be
4 2 5 1 9 2 9 0 2
and
4 7 4 0 8 6 2 6 5
(generated by rand())

So we take two sublists : 
4 2 5 and 4 7 4

Let's sort them using quicksort
4 2 5
i   j <- two pointers
pivot = [0] = 4

increase i until it's greater than pivot = 4
i and j now points to 5
decrease j until it's smaller than pivot = 4
j points to 2
i is greater than j, thus except for pivot, the list is sorted.
Swap [0] and [j] to finish sorting
2 4 5.

Except for pivot + i + j, no extra allocation was needed (For 3 elements it's hard to believe, see any quicksort applet to have a better understanding).
Thus, from an algorithmic point of view, pivot + i + j are constant and can be neglected (in fact you would do memory - algorithm size to have the real K).

Do the same for the second set
(4 4 7)

Now with two pointers to the lists initialized to the start of the sublists, compare the pointed values.
2 - 4
Increase first sublist's pointers
4 - 4
Equal -> a first common element, remove them from list [2 5] and [4 7]
5 - 4
Increase second pointer
5 - 7
Increase first pointer
End of this sublists.

Dump this to original lists.
Do this for any other sublists
[1 9 2] and [0 8 6] -> [1 2 9] [0 6 8] -> no intersection
[9 0 2] and [2 6 5] -> [0 9 E] [5 6 E] -> [2]

Now we have : 
[2 5 E 1 2 9 0 9 E] and [4 7 E 0 6 8 5 6 E] with E for empty elements.
Now compare subsets with other subsets (sorted) (excepted for already processed sets (same index))
[2 5 E] with [0 6 8] and [6 5 E] -> becomes [2 E E] and [0 6 8] [6 E E]  (intersection [5])
Then
[1 2 9] with [4 7 E] and [6 E E] -> [1 2 9] and [4 7 E] [6 E E] (no intersection)
Then
[0 9 E] with [4 7 E] and [0 6 8] -> [9 E E] and [4 7 E] [6 8 E] (intersection [0])
End : 
[2 E E 1 2 9 9 E E] [4 7 E 6 8 E 6 E E] with common elements [4 2 5 0]

回复收藏 0 原文

凉宸 2024-09-18 06:22:56

我认为您可以使用位集来实现此目的。BitSet 每个整数仅消耗一位。
希望这有帮助。


BitSet b=new BitSet();//set 1 with {10,20,30}
BitSet c=new BitSet();//set 2 with {20,30,1,2}
b.set(10);
b.set(20);
b.set(30);
c.set(20);
c.set(30);
c.set(1);
c.set(2);
b.and(c);
System.out.println(b);//{20, 30}

I think you can use bit set for this purpose.BitSet only consumes one bit per integer.
Hope this helps.


BitSet b=new BitSet();//set 1 with {10,20,30}
BitSet c=new BitSet();//set 2 with {20,30,1,2}
b.set(10);
b.set(20);
b.set(30);
c.set(20);
c.set(30);
c.set(1);
c.set(2);
b.and(c);
System.out.println(b);//{20, 30}

回复收藏 0 原文