从一组集合中查找集合子集的最佳方法

发布于 2025-01-06 16:30:15 字数 558 浏览 4 评论 0原文

首先，对含糊不清的标题表示歉意。

假设我有以下一组集合：

Group 1

s1 = ( x1, y1 )
s2 = ( x2 )

Group 2

m1 = ( x1, y1, y2 )
m2 = ( x1 )
m3 = ( x1 , x2 )

对于 Group 1 中的每个集合 - 调用集合 s，我需要在 中找到集合>组 2 - 称之为 m - 这样 m 是 s 的子集。

因此，对于我的示例，答案是：

s1 -> m2
s2 -> nothing

目前，我将值存储在 std:set 中，但如果需要，我可以更改它。此外，集合可能会变大，因此算法需要高效。目前我采用的是蛮力方法，但我对此并不完全满意。

有什么建议吗？

原文

First, sorry for the ambiguous title.

Assume I have the following group of sets:

Group 1

s1 = ( x1, y1 )
s2 = ( x2 )

Group 2

m1 = ( x1, y1, y2 )
m2 = ( x1 )
m3 = ( x1 , x2 )

For each of the sets in Group 1 - call the set s, I need to find the sets in Group 2 - call it m - such that m is a subset of s.

So, for my example, the answer would be:

s1 -> m2
s2 -> nothing

For now, I'm storing the values in std:set, but I can change that if needed. Also, the sets can get big, so the algorithm needs to be efficient. For now I have a brute-force approach, which I'm not entirely satisfied with.

Any suggestions?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦途 2025-01-13 16:30:15

第一步是根据基数（即大小）对组 1 进行排序。那么算法的顺序是：

foreach std::set M in "Group 2" {
  foreach std::set S in "Group 1" and S.size()>=M.size() {  // replace with binary search
     if ( std::includes(S.begin(),S.end(),M.begin(),M.end()) )
       { /* M is a subset of S */ }
    }
  }
}

这应该具有时间复杂度 ~O(MSR)，其中 M 是“组 2”中的集合数，S 是“组 1”中的集合数，R 是大小“组#1”中最大的集合。

编辑：我只是想到使用S.find()可能比调用std::includes()更有效（按顺序迭代）但我认为只有当 M.size() 远小于 S.size() 时，这才是正确的——O(M+S) vs O(MlogS)。

The first step would be to sort Group 1 according to cardinality (i.e. size). Then the algorithm is something on the order of:

foreach std::set M in "Group 2" {
  foreach std::set S in "Group 1" and S.size()>=M.size() {  // replace with binary search
     if ( std::includes(S.begin(),S.end(),M.begin(),M.end()) )
       { /* M is a subset of S */ }
    }
  }
}

This should have time complexity ~O(MSR), where M is the # of sets in "Group 2", S the # of sets in "Group 1", and R is the size of largest set in "Group #1".

Edit: It just occurred to me that it might be more efficient to use S.find() rather than calling std::includes() (which iterates sequentially) but I think that would only be true if M.size() is much smaller than S.size() -- O(M+S) vs O(MlogS).

回复收藏 0 原文