集合列表的快速成对交集

发布于 2025-01-09 04:23:40 字数 904 浏览 0 评论 0原文

我有一个集合 a 列表和另一个集合 b 列表。

我想将a中的每个集合与b中的每个集合相交,如果结果不为空,则报告匹配集合的索引位置,以及交叉点。

我有以下嵌套的 for 循环,但想知道是否有可能更快的实现。

import random as ra    
r1 = 500
r2 = 100
r3 = 5000
a = [set([ra.randrange(r1) for _ in range(ra.randrange(r2))]) for __ in range(ra.randrange(r3))]
b = [set([ra.randrange(r1) for _ in range(ra.randrange(r2))]) for __ in range(ra.randrange(r3))]

_match_detail = []

for s1 in range(len(a)):
    for s2 in range(len(b)):
        x = a[s1] & b[s2]
        lenx = len(x)
        if lenx > 0:
            _match_detail.append((s1,s2,lenx))
            

ab 各有大约 4000 组,在我的机器(i7-8700 CPU @3.20GHz、16GB 内存)上运行大约需要 20 秒。我最终将处理数十万组,所以这不能很好地扩展。

我看过这个上一个问题,但我认为它不适用。

I have a list of sets a and another list of sets b.

I want to intersect each set in a against every set in b, and if the result is not empty, then report the index positions of the matched sets, and the length of the intersection.

I have the following nested for-loop, but wondered if there was a faster implementation possible.

import random as ra    
r1 = 500
r2 = 100
r3 = 5000
a = [set([ra.randrange(r1) for _ in range(ra.randrange(r2))]) for __ in range(ra.randrange(r3))]
b = [set([ra.randrange(r1) for _ in range(ra.randrange(r2))]) for __ in range(ra.randrange(r3))]

_match_detail = []

for s1 in range(len(a)):
    for s2 in range(len(b)):
        x = a[s1] & b[s2]
        lenx = len(x)
        if lenx > 0:
            _match_detail.append((s1,s2,lenx))
            

With about 4000 sets each in a and b, it takes about 20 seconds to run on my machine (an i7-8700 CPU @3.20GHz with 16GB memory). I'll eventually be dealing with hundreds of thousands of sets, so this doesn't scale nicely.

I've seen this previous question but I don't think it's applicable.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文