Delphi中N数组的交集
为了找到 N 个数组的交集,我有这个实现,但效率非常低。我知道必须有一种算法来加快速度。
注意: myarray 是包含我想要查找其交集的所有其他数组的数组。
var
i, j, k: integer;
myarray: Array of Array of integer;
intersection: array of integer;
for I := 0 to length(myarray)-1 do
begin
for J := 0 to length(myarray)-1 do
begin
if i = j then
continue;
for k := 0 to length(myarray[i])-1 do
begin
if myarray[i][j] = myarray[j][k] then
begin
setLength(intersection, length(intersection)+1);
intersection[length(intersection)-1] := myarray[j][k];
end;
end;
end;
end;
我可以应用什么优化来加快速度?有没有更快的方法来做到这一点?
编辑:数组中的数据未排序。
To find the intersection of N arrays I have this implementation, which is horribly inefficient. I know there has to be an algorithm out there to speed this up.
note: myarray is the array containing all my other arrays for which I want to find the intersection for.
var
i, j, k: integer;
myarray: Array of Array of integer;
intersection: array of integer;
for I := 0 to length(myarray)-1 do
begin
for J := 0 to length(myarray)-1 do
begin
if i = j then
continue;
for k := 0 to length(myarray[i])-1 do
begin
if myarray[i][j] = myarray[j][k] then
begin
setLength(intersection, length(intersection)+1);
intersection[length(intersection)-1] := myarray[j][k];
end;
end;
end;
end;
What optimization can I apply to speed this up? Is there a faster way of doing this?
EDIT: Data in arrays are unsorted.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
有一种更快的方法:列表比较算法。它允许您以线性时间而不是二次时间比较两个列表。基本思想如下:
这可以扩展到处理 2 个以上的列表,只需付出一些努力。
There is a faster way: the list comparison algorithm. It allows you to compare two lists in linear time instead of quadratic time. Here's the basic idea:
This can be extended to deal with more than 2 lists with a bit of effort.
不幸的是,您尚未更新您的问题,因此仍然不清楚您在问什么。例如,您谈论一个交集(它应该搜索每个数组中存在的值),但从(不起作用的)代码来看,您似乎只是在任何数组中搜索重复项。
尽管Mason的答案指出了此类算法的明显通用解决方案,但我相信对于这样的多算法来说,它有些不同维数组。我制定了两个例程来确定(1)交集以及(2)重复项。两者都假设数组中长度不等的无序内容。
首先,我决定引入一些新类型:
其次,这两个例程都需要某种排序机制。一个非常快速但肮脏的方法是通过使用/误用
TList
来完成:但是通过调整
Classes.QuickSort
中的 RTL 代码可以获得更好的实现,它的作用正是与上面相同,无需复制数组(两次):交集:
要获得所有数组的交集,将最短数组中的所有值与所有其他数组中的值进行比较就足够了。由于最短数组可能包含重复值,因此对该小数组进行排序以便能够忽略重复项。从这一点来看,只需在其他数组之一中查找(或者更确切地说不查找)相同的值即可。没有必要对所有其他数组进行排序,因为在比已排序数组更早的位置找到值的机会是 50%。
重复项:
这也不需要单独对所有数组进行排序。所有值都被复制到一维临时数组中。对数组进行排序后,很容易找到重复项。
示例应用程序:
因为我喜欢测试我的所有代码,所以只需很少的工作就可以使其具有一定的代表性。
Unfortunately, you have not updated your question yet, so it still is not exactly clear what you are asking. E.g. you talk about an intersection (which should search for values that exist in every single array), but from the (not working) code it seems you are simply searching for duplicates in any of the arrays.
Although Mason's answer points to an obvious general solution for these kind of algorithms, I believe it is somewhat different for such a multi-dimensional array. I worked out two routines for determination of (1) the intersection as well as (2) the duplicates. Both assume unordered content of unequal length in the arrays.
First, I decided to introduce some new types:
Secondly, both routines need some sorting mechanism. A very quick but dirty one is done by employing/misusing a
TList
:But a much nicer implementation is gotten by adjusting the RTL code from
Classes.QuickSort
, which does exactly the same as the one above, without copying the array (twice):Intersection:
To obtain the intersection of all arrays, comparing all values in the shortest array with the values in all other arrays is enough. Because the shortest array may contain duplicate values, that small array is sorted in order to be able to ignore the duplicates. From that point it is simply a matter of finding (or rather nót finding) a same value in one of the other arrays. Sorting all other arrays is not necessary, because the chance to find a value at an earlier position than within a sorted array is 50%.
Duplicates:
This also does not require sorting all arrays individually. All values are copied into a one-dimensional temporary array. After sorting thát array, it is easy to find the duplicates.
Sample application:
And because I like to test all my code, it needed very little work to make it somewhat representative.
不应该是这样吗
?
无论如何,您可以对此代码进行的最明显、最简单的优化是将其更改
为此
我的下一步是摆脱内部循环中的外部索引表达式:
在 I 和 J 循环中,创建指向两个数组的指针整数,然后执行
然后在内部循环中你可以执行
Shouldn't that be
?
Anyway, the most obvious, simple optimization you can make to this code is changing this
into this
My next step would be to get rid of the outer index expressions in the inner loop:
In the I and J loops, create pointers to two arrays of integers, then do
Then in the inner loop you can do