如何在深嵌套列表中获取共同元素:我的两个解决方案有效,但需要一些时间
我有一个嵌套的列表结构如下。四个嵌套结构中的每一个都代表我的一些自由位置。 我想找到所有4个嵌套列表中存在的元素。
ary=[ [[0, 4], [5, 11]], [[0, 2], [0, 4], [5,10]], [[0, 4], [0, 14], [5,11]], [[0, 4], [0, 14], [5,11]] ]
如上所述,在第一个嵌套列表中,除[5,11]
不是。因此,我的答案应该是[[0,4]],甚至应该只是[0,4]。
我以两种方式做到了这一点。 解决方案1:
ary1=ary
newlist = [item for items in ary for item in items]
x=[i for i in ary[0] if newlist.count(i)== len(ary1)]
#OUTPUT is x= [[0,4]]
解决方案2:
x=[]
for u in ary[0]:
n=[]
n=[1 for t in range(1,len(ary)) if u in ary[t]]
if len(ary)-1==len(n):
x.append(u)
#OUTPUT is x= [[0,4]]
使用线profiler检查时,这两个似乎需要类似的计算时间。这是我数百个代码留置权中重度计算的唯一点,我想减少这一点。 那么,您能否建议其他可以比这两个解决方案更好地完成任务的python命令/代码?
I have a nested list structure as below. Each of the 4 nested structures represents some free positions for me. I want to find which elements are present in all 4 nested lists.
ary=[ [[0, 4], [5, 11]], [[0, 2], [0, 4], [5,10]], [[0, 4], [0, 14], [5,11]], [[0, 4], [0, 14], [5,11]] ]
As in above, in the first nested list [[0, 4], [5, 11]]
, the [0,4]
is present in all but [5,11]
is not. Hence, my answer should be [[0,4]] or even just [0,4].
I did this in two ways.
Solution1:
ary1=ary
newlist = [item for items in ary for item in items]
x=[i for i in ary[0] if newlist.count(i)== len(ary1)]
#OUTPUT is x= [[0,4]]
Solution2:
x=[]
for u in ary[0]:
n=[]
n=[1 for t in range(1,len(ary)) if u in ary[t]]
if len(ary)-1==len(n):
x.append(u)
#OUTPUT is x= [[0,4]]
These two seem to take similar computational time when checked using line profiler. And this is the only point of heavy computation in my hundreds of liens of code and I want to reduce this. So, can you suggest any other Python commands/code that can do the task better than these two solutions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以尝试将第二级嵌套数组转换为一组元组,其中每个最低级别数组(即[0,4])是集合的元素。
需要转换为元组,因为列表不可使用。
将每个列表列表作为一组列表后,只需找到它们的交叉点即可。
You can try to convert each nested array at the second level into the set of tuples, where each lowest level array (i.e. [0,4]) is an element of the set.
The conversion into tuples is required because lists are not hashable.
Once you have each nested list of lists as a set, simply find their intersection.
我不建议这样做,我只是想出了另一种方式,以便您可以比较它。也许这是一个幸运的镜头,需要比主题方式更少的计算。
这就要求每个嵌套列表中总有一个元素,因为我不明确检查它。
I'm not suggesting to do it like this, I just came up with another way so you could compare it. Maybe it is a lucky shot and need less computation than your topical ways.
This requires that there is always one element which is present in every nested list because I don't check it explicitly.
我可以想到的两种方法,
如果是这样,请考虑选项2。但是,我的理论是将其变成一种基于二进制的方法,而不是通过外部阵列的每个直接元素迭代。
可能需要使用递归功能的itertools.tee()和/或多线程(取决于外部列表的长度)。
递归函数将在每次迭代中将列表划分为一半,直到确定分裂的长度足够小,以开始排除罕见的元素(例如[5-11])。
然后,共同的元素将传递回递归层次结构。
更仔细地设计此问题应有助于评估条件/阈值,以避免诸如线程计数过多的失控条件
似乎所有第三级列表(例如,[[0,2],[0,4],[5,10]] )分类。如果不是,然后对它们进行排序,消除(pop)重复,然后使用 +运算符将它们合并在一起,然后求助。
之后,您最终将获得包含与Ary1长度一样多[0,4]的结构。
这可能是您将[0,4]识别为答案的条件。
再次可能需要测试
Two approaches that I can think of
Depending on how intensive you are willing to take this and how big the target list is going to be, it may need a little bit of time and testing.
If so, consider option 2. But my theory is to turn it into a binary based approach as opposed to iterating thru each of the direct elements of the outer array.
May need the use of itertools.tee() and/or multithreading with a recursive function (depending on the length of the outer list).
The recursive function will split the list by half in every iteration, until it is determined splits are small enough in length to start ruling out uncommon elements (like [5-11]).
Then common elements are passed back up the recursion hierarchy.
Designing this more closely should help assess conditions/threshold to avoid runaway conditions like excessive thread counts
It seems that all third level lists (e.g., [[0, 2], [0, 4], [5,10]]) are sorted. If not, then sort them, eliminate (pop) duplicates, and then merge them all together using + operator, and then resort.
After that you will end up with a structure containing as many [0,4]'s as the length of ary1.
That could be your condition for identifying [0,4] as the answer.
That again may need to be tested