如何删除嵌套列表中的重叠项?
我正在尝试删除嵌套列表中的重叠值。
数据如下所示:
[[22, 37, 'foobar'], [301, 306, 'foobar'],[369, 374, 'foobar'], [650, 672, 'foobar'], [1166, 1174, 'foobar'],[1469, 1477, 'foobar'],[2237, 2245, 'foobar'],[2702, 2724, 'foobar'],[3426, 3446, 'foobar'],[3505, 3513, 'foobar'],[3756, 3764, 'foobar'],[69524, 69535, 'foobar'],[3812, 3820, 'foobar'],[4034, 4057, 'foobar'],[4318, 4347, 'foobar'],[58531, 58548, 'foobar'],[4552, 4574, 'foobar'],[4854, 4861, 'foobar'],[5769, 5831, 'foobar'], [5976, 5986, 'foobar'],[6541, 6558, 'foobar'],[6541, 6608, 'foobar'],[7351, 7364, 'foobar'],[7351, 7364, 'foobar'], [7764, 7770, 'foobar'],[58540, 58548, 'foobar'],[69524, 69556, 'foobar']]
列表中的索引 0 和 1 中有一些重叠的值。例如:
[6541, 6558, 'foobar'] overlaps with [6541, 6608, 'foobar']
[7351, 7364, 'foobar'] overlaps with [7351, 7364, 'foobar']
[58531, 58548, 'foobar'] overlaps with [58540, 58548, 'foobar']
[69524, 69535, 'foobar'] overlaps with [69524, 69556, 'foobar']
我试图遍历列表并删除重叠值的较短的第一个实例。如果 [6541, 6558, 'foobar']
与 [6541, 6608, 'foobar']
重叠,我想保留 [6541, 6608, 'foobar']
并从列表中删除 [6541, 6558, 'foobar']
。
到目前为止我尝试过:
def clean_span(adata):
data = adata.copy()
rem_idx = []
for i in range(len(data)-1):
if data[i][0] in data[i+1] or data[i][1] in data[i+1]:
print(" {} overlaps with {}".format(data[i], data[i+1]))
rem_idx.append(i)
for i in rem_idx:
del data[i]
return data
但是这段代码总是留下一些重叠的值。
与此方法相同。
def clean_span(adata):
data = adata.copy()
new_data = []
for i in range(len(data)-1):
if data[i][0] in data[i+1] or data[i][1] in data[i+1]:
print(" {} overlaps with {}".format(data[i], data[i+1]))
new_data.append(data[i+1])
else:
new_data.append(data[i])
return new_data
我非常感谢您帮助解决这个问题。
I am trying to delete the overlapping values in a nested list.
The data looks like this:
[[22, 37, 'foobar'], [301, 306, 'foobar'],[369, 374, 'foobar'], [650, 672, 'foobar'], [1166, 1174, 'foobar'],[1469, 1477, 'foobar'],[2237, 2245, 'foobar'],[2702, 2724, 'foobar'],[3426, 3446, 'foobar'],[3505, 3513, 'foobar'],[3756, 3764, 'foobar'],[69524, 69535, 'foobar'],[3812, 3820, 'foobar'],[4034, 4057, 'foobar'],[4318, 4347, 'foobar'],[58531, 58548, 'foobar'],[4552, 4574, 'foobar'],[4854, 4861, 'foobar'],[5769, 5831, 'foobar'], [5976, 5986, 'foobar'],[6541, 6558, 'foobar'],[6541, 6608, 'foobar'],[7351, 7364, 'foobar'],[7351, 7364, 'foobar'], [7764, 7770, 'foobar'],[58540, 58548, 'foobar'],[69524, 69556, 'foobar']]
There are some overlapping values in index 0 and 1 of across the list. Such as:
[6541, 6558, 'foobar'] overlaps with [6541, 6608, 'foobar']
[7351, 7364, 'foobar'] overlaps with [7351, 7364, 'foobar']
[58531, 58548, 'foobar'] overlaps with [58540, 58548, 'foobar']
[69524, 69535, 'foobar'] overlaps with [69524, 69556, 'foobar']
I am trying to go through the list and remove shorter first instance of the overlapping values. If[6541, 6558, 'foobar']
overlaps with [6541, 6608, 'foobar']
I want to keep [6541, 6608, 'foobar']
and remove [6541, 6558, 'foobar']
from the list.
So far i tried:
def clean_span(adata):
data = adata.copy()
rem_idx = []
for i in range(len(data)-1):
if data[i][0] in data[i+1] or data[i][1] in data[i+1]:
print(" {} overlaps with {}".format(data[i], data[i+1]))
rem_idx.append(i)
for i in rem_idx:
del data[i]
return data
But this code always leaves some overlapping values behind.
It is same with this approach as well.
def clean_span(adata):
data = adata.copy()
new_data = []
for i in range(len(data)-1):
if data[i][0] in data[i+1] or data[i][1] in data[i+1]:
print(" {} overlaps with {}".format(data[i], data[i+1]))
new_data.append(data[i+1])
else:
new_data.append(data[i])
return new_data
I would appreciate your help to solve this problem.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
保留2个词典:一个用于第一个元素,一个用于第二个元素。然后,在迭代数据时,请检查各个词典中是否存在任何键 - 如果找到键,则是重叠;否则不是。
代码中的问题:
在if语句中,您将
i + 1
添加到new_data
。因此,自然而然地,当循环增加到i + 1
时,它将进入else
,并且将重叠的元素添加回列表。旁注:
您正在尝试在此处搜索整个列表中的元素。使您的时间复杂性
o(nk)
。Keep 2 dictionaries: one for first element and one for second. Then while iterating on the data, check if any of the keys exist in the respective dictionaries -- if the key is found, it is an overlap; else it is not.
Problem in your code:
In the if statement, you add
i + 1
to thenew_data
. So naturally when the loop increments toi + 1
, it goes into theelse
and it adds the overlapped element back to the list.Side note:
You are trying to search the element in the entire list here. Making your time complexity
O(nk)
.可以使用
set.Intersection
找到重叠。Overlapping can be found with
set.intersection
.