如何获取多个嵌套列表的交集?
我正在和Python一起玩,想知道如何获得多个嵌套列表的交集。
list1 = [[10, 1], [200, 2], [300, 9], [400, 1], [500, 1]]
list2 = [[22, 1], [200, 2], [300, 9], [900, 1], [660, 1], [500, 1]]
list3 = [[30, 1], [200, 2], [300, 0], [400, 1], [500, 1]]
每个列表包含多个小列表,每个列表包含两个整数。每个小列表的第一个整数在每个列表中都不同。
现在,如果第一个整数出现在每个列表中,则将其保留,请总结第二个整数。
然后,结果应为[[200,6],[300,18],[500,3]]
一个朋友解决了这个问题:
import pandas as pd
list1 = [[100, 1], [200, 2], [300, 99], [400, 1], [500, 1]]
list2 = [[22, 1], [200, 2], [300, 9], [900, 1],[1000, 1], [500, 1]]
list3 = [[30, 1], [200, 2], [300, 0], [400, 1], [500, 1]]
df1 = pd.DataFrame(list1)
df2 = pd.DataFrame(list2)
df3 = pd.DataFrame(list3)
df4 = df1.merge(df2, on=0, how='inner')
df5 = df4.merge(df3, on=0, how='inner')
df6 = df5.copy()
df6['sum'] = df6.iloc[:, 1:].sum(axis=1)
list4 = df6[[0, 'sum']].values.tolist()
print(list4)
当数据很大时,运行速度会慢。我想知道是否有任何方法可以使用“设置”来加快速度?
I am playing with python and wonder how to get the intersection of multiple nested lists.
list1 = [[10, 1], [200, 2], [300, 9], [400, 1], [500, 1]]
list2 = [[22, 1], [200, 2], [300, 9], [900, 1], [660, 1], [500, 1]]
list3 = [[30, 1], [200, 2], [300, 0], [400, 1], [500, 1]]
Each list contains multiple small lists, each small list contains two integers. The first integer of each small list is different in each list.
Now, if the first integer appears in each list , then keep it, sum the second integer.
Then the result should be [[200,6],[300,18],[500,3]]
A friend solved this problem:
import pandas as pd
list1 = [[100, 1], [200, 2], [300, 99], [400, 1], [500, 1]]
list2 = [[22, 1], [200, 2], [300, 9], [900, 1],[1000, 1], [500, 1]]
list3 = [[30, 1], [200, 2], [300, 0], [400, 1], [500, 1]]
df1 = pd.DataFrame(list1)
df2 = pd.DataFrame(list2)
df3 = pd.DataFrame(list3)
df4 = df1.merge(df2, on=0, how='inner')
df5 = df4.merge(df3, on=0, how='inner')
df6 = df5.copy()
df6['sum'] = df6.iloc[:, 1:].sum(axis=1)
list4 = df6[[0, 'sum']].values.tolist()
print(list4)
When the data is huge, the running speed is slow. I wonder if there is any method that can use "set" to speed up the speed?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我建议将您的嵌套列表转换为词典,在这里似乎更有意义地使其更有意义 - 不仅是为了此操作,而且要使用数据。然后,您只需要一个简单的词典理解。
或与嵌套列表理解相同,如果您确实需要列表:
关于运行时间:在O(n)中创建dicts之后,
如果d2中的k和d3
in d3 检查为o(1 )对于O(n)的总体复杂性,应尽可能快地。关于评论的建议:使用
d1.keys()& d2.keys()& d3.keys()
也有效,但似乎并不快(实际上要慢一点,对于25%重叠的1000个元素)。I'd suggest converting your nested lists to dictionaries, which seem to make way more sense here -- not just for this operation, but for working with the data in general. Then, all you need is a simple dictionary comprehension.
Or the same as a nested list comprehension, if you really need lists:
About running time: After creating the dicts in O(n), the
if k in d2 and k in d3
checks are O(1) for an overall complexity of O(n), which should be about as fast as it gets.About suggestion from comments: Using
d1.keys() & d2.keys() & d3.keys()
works, too, but does not seem to be any faster (a bit slower in fact, for 1000 elements with 25% overlap).