如何获取多个嵌套列表的交集?

发布于 2025-02-05 02:54:23 字数 954 浏览 3 评论 0原文

我正在和Python一起玩,想知道如何获得多个嵌套列表的交集。

list1 = [[10, 1], [200, 2], [300, 9], [400, 1], [500, 1]]
list2 = [[22, 1], [200, 2], [300, 9], [900, 1], [660, 1], [500, 1]]
list3 = [[30, 1], [200, 2], [300, 0], [400, 1], [500, 1]]

每个列表包含多个小列表,每个列表包含两个整数。每个小列表的第一个整数在每个列表中都不同。

现在,如果第一个整数出现在每个列表中,则将其保留,请总结第二个整数。

然后,结果应为[[200,6],[300,18],[500,3]]

一个朋友解决了这个问题:

import pandas as pd
list1 = [[100, 1], [200, 2], [300, 99], [400, 1], [500, 1]]
list2 = [[22, 1], [200, 2], [300, 9], [900, 1],[1000, 1], [500, 1]]
list3 = [[30, 1], [200, 2], [300, 0], [400, 1], [500, 1]]

df1 = pd.DataFrame(list1)
df2 = pd.DataFrame(list2)
df3 = pd.DataFrame(list3)

df4 = df1.merge(df2, on=0, how='inner')

df5 = df4.merge(df3, on=0, how='inner')

df6 = df5.copy()
df6['sum'] = df6.iloc[:, 1:].sum(axis=1)
list4 = df6[[0, 'sum']].values.tolist()

print(list4)

当数据很大时,运行速度会慢。我想知道是否有任何方法可以使用“设置”来加快速度?

I am playing with python and wonder how to get the intersection of multiple nested lists.

list1 = [[10, 1], [200, 2], [300, 9], [400, 1], [500, 1]]
list2 = [[22, 1], [200, 2], [300, 9], [900, 1], [660, 1], [500, 1]]
list3 = [[30, 1], [200, 2], [300, 0], [400, 1], [500, 1]]

Each list contains multiple small lists, each small list contains two integers. The first integer of each small list is different in each list.

Now, if the first integer appears in each list , then keep it, sum the second integer.

Then the result should be [[200,6],[300,18],[500,3]]

A friend solved this problem:

import pandas as pd
list1 = [[100, 1], [200, 2], [300, 99], [400, 1], [500, 1]]
list2 = [[22, 1], [200, 2], [300, 9], [900, 1],[1000, 1], [500, 1]]
list3 = [[30, 1], [200, 2], [300, 0], [400, 1], [500, 1]]

df1 = pd.DataFrame(list1)
df2 = pd.DataFrame(list2)
df3 = pd.DataFrame(list3)

df4 = df1.merge(df2, on=0, how='inner')

df5 = df4.merge(df3, on=0, how='inner')

df6 = df5.copy()
df6['sum'] = df6.iloc[:, 1:].sum(axis=1)
list4 = df6[[0, 'sum']].values.tolist()

print(list4)

When the data is huge, the running speed is slow. I wonder if there is any method that can use "set" to speed up the speed?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

把时间冻结 2025-02-12 02:54:23

每个列表包含多个小列表,每个列表包含两个整数。每个小列表的第一个整数在每个列表中都不同。

我建议将您的嵌套列表转换为词典,在这里似乎更有意义地使其更有意义 - 不仅是为了此操作,而且要使用数据。然后,您只需要一个简单的词典理解。

>>> d1, d2, d3 = dict(list1), dict(list2), dict(list3)
>>> d1
{10: 1, 200: 2, 300: 9, 400: 1, 500: 1}
>>> {k: d1[k] + d2[k] + d3[k] for k in d1 if k in d2 and k in d3}
{200: 6, 300: 18, 500: 3}

或与嵌套列表理解相同,如果您确实需要列表:

>>> [[k, d1[k] + d2[k] + d3[k]] for k in d1 if k in d2 and k in d3]
[[200, 6], [300, 18], [500, 3]]

关于运行时间:在O(n)中创建dicts之后,如果d2中的k和d3 in d3 检查为o(1 )对于O(n)的总体复杂性,应尽可能快地。


关于评论的建议:使用d1.keys()& d2.keys()& d3.keys()也有效,但似乎并不快(实际上要慢一点,对于25%重叠的1000个元素)。

>>> d1, d2, d3 = ({random.randrange(1000): random.randrange(1000) for _ in range(1000)} for _ in range(3))                                                              
>>> len(set(d1) & set(d2) & set(d3))                                                                                              
259
>>> %timeit {k: d1[k] + d2[k] + d3[k] for k in d1 if k in d2 and k in d3}                                                                                               
84.2 µs ± 1.19 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit {k: d1[k] + d2[k] + d3[k] for k in d1.keys() & d2.keys() & d3.keys()}                                                                                       
117 µs ± 817 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Each list contains multiple small lists, each small list contains two integers. The first integer of each small list is different in each list.

I'd suggest converting your nested lists to dictionaries, which seem to make way more sense here -- not just for this operation, but for working with the data in general. Then, all you need is a simple dictionary comprehension.

>>> d1, d2, d3 = dict(list1), dict(list2), dict(list3)
>>> d1
{10: 1, 200: 2, 300: 9, 400: 1, 500: 1}
>>> {k: d1[k] + d2[k] + d3[k] for k in d1 if k in d2 and k in d3}
{200: 6, 300: 18, 500: 3}

Or the same as a nested list comprehension, if you really need lists:

>>> [[k, d1[k] + d2[k] + d3[k]] for k in d1 if k in d2 and k in d3]
[[200, 6], [300, 18], [500, 3]]

About running time: After creating the dicts in O(n), the if k in d2 and k in d3 checks are O(1) for an overall complexity of O(n), which should be about as fast as it gets.


About suggestion from comments: Using d1.keys() & d2.keys() & d3.keys() works, too, but does not seem to be any faster (a bit slower in fact, for 1000 elements with 25% overlap).

>>> d1, d2, d3 = ({random.randrange(1000): random.randrange(1000) for _ in range(1000)} for _ in range(3))                                                              
>>> len(set(d1) & set(d2) & set(d3))                                                                                              
259
>>> %timeit {k: d1[k] + d2[k] + d3[k] for k in d1 if k in d2 and k in d3}                                                                                               
84.2 µs ± 1.19 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit {k: d1[k] + d2[k] + d3[k] for k in d1.keys() & d2.keys() & d3.keys()}                                                                                       
117 µs ± 817 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文