集合列表的快速成对交集
我有一个集合 a
列表和另一个集合 b
列表。
我想将a
中的每个集合与b
中的每个集合相交,如果结果不为空,则报告匹配集合的索引位置,以及交叉点。
我有以下嵌套的 for 循环,但想知道是否有可能更快的实现。
import random as ra
r1 = 500
r2 = 100
r3 = 5000
a = [set([ra.randrange(r1) for _ in range(ra.randrange(r2))]) for __ in range(ra.randrange(r3))]
b = [set([ra.randrange(r1) for _ in range(ra.randrange(r2))]) for __ in range(ra.randrange(r3))]
_match_detail = []
for s1 in range(len(a)):
for s2 in range(len(b)):
x = a[s1] & b[s2]
lenx = len(x)
if lenx > 0:
_match_detail.append((s1,s2,lenx))
a
和 b
各有大约 4000 组,在我的机器(i7-8700 CPU @3.20GHz、16GB 内存)上运行大约需要 20 秒。我最终将处理数十万组,所以这不能很好地扩展。
我看过这个上一个问题,但我认为它不适用。
I have a list of sets a
and another list of sets b
.
I want to intersect each set in a
against every set in b
, and if the result is not empty, then report the index positions of the matched sets, and the length of the intersection.
I have the following nested for-loop, but wondered if there was a faster implementation possible.
import random as ra
r1 = 500
r2 = 100
r3 = 5000
a = [set([ra.randrange(r1) for _ in range(ra.randrange(r2))]) for __ in range(ra.randrange(r3))]
b = [set([ra.randrange(r1) for _ in range(ra.randrange(r2))]) for __ in range(ra.randrange(r3))]
_match_detail = []
for s1 in range(len(a)):
for s2 in range(len(b)):
x = a[s1] & b[s2]
lenx = len(x)
if lenx > 0:
_match_detail.append((s1,s2,lenx))
With about 4000 sets each in a
and b
, it takes about 20 seconds to run on my machine (an i7-8700 CPU @3.20GHz with 16GB memory). I'll eventually be dealing with hundreds of thousands of sets, so this doesn't scale nicely.
I've seen this previous question but I don't think it's applicable.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论