当前位置：文江博客话题详情

list algorithm set-intersection

高效的列表交集算法

发布于 2024-07-13 04:08:56 字数 68 浏览 16 评论 0原文

给定两个列表（不一定是排序的），找到这些列表的集合交集的最有效的非递归算法是什么？
我不相信我有权使用哈希算法。

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（15）

静谧幽蓝 2024-07-20 04:08:56

您可以将第一个列表的所有元素放入哈希集中。然后，迭代第二个列表，并针对其每个元素检查哈希以查看它是否存在于第一个列表中。如果是，则将其作为交集的元素输出。

回复收藏 0 原文

起风了 2024-07-20 04:08:56

您可能想看看布隆过滤器。它们是位向量，给出元素是否是集合成员的概率答案。集合交集可以通过简单的按位与运算来实现。如果您有大量空交集，布隆过滤器可以帮助您快速消除这些空交集。但是，您仍然需要求助于此处提到的其他算法之一来计算实际的交集。
http://en.wikipedia.org/wiki/Bloom_filter

回复收藏 0 原文

人│生佛魔见 2024-07-20 04:08:56

如果没有散列，我想您有两个选择：

天真的方法是将每个元素与其他每个元素进行比较。 O(n^2)
另一种方法是先对列表进行排序，然后迭代它们： O(n lg n) * 2 + 2 * O(n)

回复收藏 0 原文

寄居者 2024-07-20 04:08:56

从 eviews 功能列表来看，它似乎支持复杂的合并和连接（如果这是' join'（如数据库术语中所示），它将计算交集）。现在深入研究您的文档:-)

此外，eviews 有自己的用户论坛 - 为什么不在那里询问_

回复收藏 0 原文

夜血缘 2024-07-20 04:08:56

使用集合 1 构建一个具有 O(log n) 的二叉搜索树，并迭代 set2 并搜索 BST m XO(log n) 所以总共 O(log n ) + O(m)+O(log n) ==> O(log n)(m+1)

回复收藏 0 原文

日记撕了你也走了 2024-07-20 04:08:56

在 C++ 中，可以使用 STL 映射尝试以下操作

vector<int> set_intersection(vector<int> s1, vector<int> s2){

    vector<int> ret;
    map<int, bool> store;
    for(int i=0; i < s1.size(); i++){

        store[s1[i]] = true;
    }
    for(int i=0; i < s2.size(); i++){

        if(store[s2[i]] == true) ret.push_back(s2[i]);

    }
    return ret;
}

in C++ the following can be tried using STL map

vector<int> set_intersection(vector<int> s1, vector<int> s2){

    vector<int> ret;
    map<int, bool> store;
    for(int i=0; i < s1.size(); i++){

        store[s1[i]] = true;
    }
    for(int i=0; i < s2.size(); i++){

        if(store[s2[i]] == true) ret.push_back(s2[i]);

    }
    return ret;
}

回复收藏 0 原文

深爱成瘾 2024-07-20 04:08:56

这是我提出的另一种可能的解决方案，时间复杂度为 O(nlogn)，并且不需要任何额外的存储。您可以在这里查看 https://gist.github.com/4455373

以下是它的工作原理：假设集合不包含任何重复，将所有集合合并为一个集合并排序。然后循环遍历合并的集合，并在每次迭代时在当前索引 i 和 i+n 之间创建一个子集，其中 n 是宇宙中可用集合的数量。我们在循环时寻找的是一个大小为 n 的重复序列，该序列等于宇宙中的集合数。

如果 i 处的子集等于 n 处的子集，则意味着 i 处的元素重复 n 次，这等于集合的总数。由于任何集合中都没有重复，这意味着每个集合都包含该值，因此我们将其添加到交集。然后我们将索引移动 i + 它和 n 之间剩余的内容，因为这些索引肯定不会形成重复序列。

回复收藏 0 原文

对不⑦ 2024-07-20 04:08:56

首先，使用快速排序对两个列表进行排序：O(n*log(n)。然后，通过首先浏览最低值来比较列表，然后添加公共值。例如，在 lua) 中：

function findIntersection(l1, l2)
    i, j = 1,1
    intersect = {}

    while i < #l1 and j < #l2 do
        if l1[i] == l2[i] then
            i, j = i + 1, j + 1
            table.insert(intersect, l1[i])
        else if l1[i] > l2[j] then
            l1, l2 = l2, l1
            i, j = j, i
        else
            i = i + 1
        end
    end

    return intersect
end

即 O(max (n, m)) 其中 n 和 m 是列表的大小。

编辑：快速排序是递归的，如评论中所述，但看起来有非递归实现

First, sort both lists using quicksort : O(n*log(n). Then, compare the lists by browsing the lowest values first, and add the common values. For example, in lua) :

function findIntersection(l1, l2)
    i, j = 1,1
    intersect = {}

    while i < #l1 and j < #l2 do
        if l1[i] == l2[i] then
            i, j = i + 1, j + 1
            table.insert(intersect, l1[i])
        else if l1[i] > l2[j] then
            l1, l2 = l2, l1
            i, j = j, i
        else
            i = i + 1
        end
    end

    return intersect
end

which is O(max(n, m)) where n and m are the sizes of the lists.

EDIT: quicksort is recursive, as said in the comments, but it looks like there are non-recursive implementations

回复收藏 0 原文

南烟 2024-07-20 04:08:56

使用跳过指针和 SSE 说明可以提高列表交叉效率。

回复收藏 0 原文

浅黛梨妆こ 2024-07-20 04:08:56

为什么不实现自己的简单哈希表或哈希集？如果您的列表如您所说的很大，那么避免 nlogn 交叉是值得的。

由于您事先对数据有所了解，因此您应该能够选择一个好的哈希函数。

回复收藏 0 原文

缱倦旧时光 2024-07-20 04:08:56

我赞同“集合”的想法。在 JavaScript 中，您可以使用第一个列表来填充对象，并使用列表元素作为名称。然后，您使用第二个列表中的列表元素并查看这些属性是否存在。

回复收藏 0 原文

不交电费瞎发啥光 2024-07-20 04:08:56

如果支持集（正如您在标题中所称的那样）作为内置，通常有一个交集方法。

无论如何，正如有人所说，如果您对列表进行排序，您可以轻松做到这一点（我不会发布代码，有人已经这样做了）。如果不能使用递归也没有问题。有无递归快速排序实现。

回复收藏 0 原文

雪花飘飘的天空 2024-07-20 04:08:56

在 PHP 中，类似

function intersect($X) { // X is an array of arrays; returns intersection of all the arrays
  $counts = Array(); $result = Array();
  foreach ($X AS $x) {
    foreach ($x AS $y) { $counts[$y]++; }
  }
  foreach ($counts AS $x => $count) {
    if ($count == count($X)) { $result[] = $x; }
  }
  return $result;
}

In PHP, something like

function intersect($X) { // X is an array of arrays; returns intersection of all the arrays
  $counts = Array(); $result = Array();
  foreach ($X AS $x) {
    foreach ($x AS $y) { $counts[$y]++; }
  }
  foreach ($counts AS $x => $count) {
    if ($count == count($X)) { $result[] = $x; }
  }
  return $result;
}

回复收藏 0 原文

七堇年 2024-07-20 04:08:56

根据 Big-Oh 表示法的定义：

T(N) = O(f(N)) 如果存在正常数 c 和 n 0 使得
当 N ≥ n 0 时，T(N) ≤ cf(N)。

这实际上意味着，如果两个列表的大小相对较小，则每两个 for 循环中少于 100 个元素就可以了。循环第一个列表并在第二个列表中查找相似的对象。
就我而言，它工作得很好，因为我的列表中最多不会有超过 10 - 20 个元素。
然而，一个好的解决方案是对第一个 O(n log n) 进行排序，对第二个 O(n log n) 进行排序并合并它们，另一个 O(n log n) 大致来说是 O(3 n log n)，可以这么说两个列表的大小相同。

回复收藏 0 原文

清风挽心 2024-07-20 04:08:56

时间：O(n) 空间：O(1) 用于识别交点的解决方案。

例如，两个给定节点将检测每次到达终点时交换指针来确定交点。此处有视频说明。

public ListNode getIntersectionNode(ListNode headA, ListNode headB) {
    ListNode pA = headA;
    ListNode pB = headB;
    while (pA != pB) {
        pA = pA == null ? headB : pA.next;
        pB = pB == null ? headA : pB.next;
    }
    return pA;
}

谢谢。

编辑

我对交点的解释是找到交点。

例如：

对于给定的列表 A 和 B，A 和 B 将在点 c1 处“相遇/相交”，上面的算法将返回 c1 。正如 OP 所说，OP 无法访问 Hashmaps 或某种类型，我相信 OP 是说该算法应该具有 O(1) 空间复杂度。

我前段时间从 Leetcode 得到了这个想法，如果有兴趣的话：两个链接的交集列表。

Time: O(n) Space: O(1) Solution for identifying points of intersection.

For example, the two given nodes will detect the point of intersection by swapping pointers every time they reach the end. Video Explanation Here.

public ListNode getIntersectionNode(ListNode headA, ListNode headB) {
    ListNode pA = headA;
    ListNode pB = headB;
    while (pA != pB) {
        pA = pA == null ? headB : pA.next;
        pB = pB == null ? headA : pB.next;
    }
    return pA;
}

Thanks.

Edit

My interpretation of intersection is finding the point of intersection.

For example:

For the given lists A and B, A and B will "meet/intersect" at point c1, and the algo above will return c1. As OP stated that OP has NO access to Hashmaps or some sort, I believe OP is saying that the algo should have O(1) space complexity.

I got this idea from Leetcode some time ago, if interested: Intersection of Two Linked Lists.

回复收藏 0 原文

~没有更多了~

关于作者

柒夜笙歌凉

暂无简介

文章

评论

360 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

知足的幸福

文章 0 评论 0

我一向站在原地

文章 0 评论 0

慕烟庭风

文章 0 评论 0

秉忠贞之诚守退让之实

文章 0 评论 0

小兔几

文章 0 评论 0

mb_3y7WUgWY

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文