如何删除嵌套列表中的重叠项?

发布于 2025-01-19 16:26:14 字数 1990 浏览 5 评论 0原文

我正在尝试删除嵌套列表中的重叠值。

数据如下所示:

[[22, 37, 'foobar'], [301, 306, 'foobar'],[369, 374, 'foobar'], [650, 672, 'foobar'], [1166, 1174, 'foobar'],[1469, 1477, 'foobar'],[2237, 2245, 'foobar'],[2702, 2724, 'foobar'],[3426, 3446, 'foobar'],[3505, 3513, 'foobar'],[3756, 3764, 'foobar'],[69524, 69535, 'foobar'],[3812, 3820, 'foobar'],[4034, 4057, 'foobar'],[4318, 4347, 'foobar'],[58531, 58548, 'foobar'],[4552, 4574, 'foobar'],[4854, 4861, 'foobar'],[5769, 5831, 'foobar'], [5976, 5986, 'foobar'],[6541, 6558, 'foobar'],[6541, 6608, 'foobar'],[7351, 7364, 'foobar'],[7351, 7364, 'foobar'], [7764, 7770, 'foobar'],[58540, 58548, 'foobar'],[69524, 69556, 'foobar']]

列表中的索引 0 和 1 中有一些重叠的值。例如:

 [6541, 6558, 'foobar'] overlaps with [6541, 6608, 'foobar']
 [7351, 7364, 'foobar'] overlaps with [7351, 7364, 'foobar']
 [58531, 58548, 'foobar'] overlaps with [58540, 58548, 'foobar']
 [69524, 69535, 'foobar'] overlaps with [69524, 69556, 'foobar']

我试图遍历列表并删除重叠值的较短的第一个实例。如果 [6541, 6558, 'foobar'][6541, 6608, 'foobar'] 重叠,我想保留 [6541, 6608, 'foobar'] 并从列表中删除 [6541, 6558, 'foobar']

到目前为止我尝试过:

def clean_span(adata):
    data = adata.copy()
    rem_idx = []
    for i in range(len(data)-1):
        if data[i][0] in data[i+1] or data[i][1] in data[i+1]:
            print(" {} overlaps with {}".format(data[i], data[i+1]))
            rem_idx.append(i)
    
    for i in rem_idx:
        del data[i]
    return data

但是这段代码总是留下一些重叠的值。

与此方法相同。

def clean_span(adata):
    data = adata.copy()
    new_data = []
    for i in range(len(data)-1):
        if data[i][0] in data[i+1] or data[i][1] in data[i+1]:
            print(" {} overlaps with {}".format(data[i], data[i+1]))
            new_data.append(data[i+1])
        else:
            new_data.append(data[i])
    return new_data

我非常感谢您帮助解决这个问题。

I am trying to delete the overlapping values in a nested list.

The data looks like this:

[[22, 37, 'foobar'], [301, 306, 'foobar'],[369, 374, 'foobar'], [650, 672, 'foobar'], [1166, 1174, 'foobar'],[1469, 1477, 'foobar'],[2237, 2245, 'foobar'],[2702, 2724, 'foobar'],[3426, 3446, 'foobar'],[3505, 3513, 'foobar'],[3756, 3764, 'foobar'],[69524, 69535, 'foobar'],[3812, 3820, 'foobar'],[4034, 4057, 'foobar'],[4318, 4347, 'foobar'],[58531, 58548, 'foobar'],[4552, 4574, 'foobar'],[4854, 4861, 'foobar'],[5769, 5831, 'foobar'], [5976, 5986, 'foobar'],[6541, 6558, 'foobar'],[6541, 6608, 'foobar'],[7351, 7364, 'foobar'],[7351, 7364, 'foobar'], [7764, 7770, 'foobar'],[58540, 58548, 'foobar'],[69524, 69556, 'foobar']]

There are some overlapping values in index 0 and 1 of across the list. Such as:

 [6541, 6558, 'foobar'] overlaps with [6541, 6608, 'foobar']
 [7351, 7364, 'foobar'] overlaps with [7351, 7364, 'foobar']
 [58531, 58548, 'foobar'] overlaps with [58540, 58548, 'foobar']
 [69524, 69535, 'foobar'] overlaps with [69524, 69556, 'foobar']

I am trying to go through the list and remove shorter first instance of the overlapping values. If
[6541, 6558, 'foobar'] overlaps with [6541, 6608, 'foobar'] I want to keep [6541, 6608, 'foobar'] and remove [6541, 6558, 'foobar'] from the list.

So far i tried:

def clean_span(adata):
    data = adata.copy()
    rem_idx = []
    for i in range(len(data)-1):
        if data[i][0] in data[i+1] or data[i][1] in data[i+1]:
            print(" {} overlaps with {}".format(data[i], data[i+1]))
            rem_idx.append(i)
    
    for i in rem_idx:
        del data[i]
    return data

But this code always leaves some overlapping values behind.

It is same with this approach as well.

def clean_span(adata):
    data = adata.copy()
    new_data = []
    for i in range(len(data)-1):
        if data[i][0] in data[i+1] or data[i][1] in data[i+1]:
            print(" {} overlaps with {}".format(data[i], data[i+1]))
            new_data.append(data[i+1])
        else:
            new_data.append(data[i])
    return new_data

I would appreciate your help to solve this problem.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

红焚 2025-01-26 16:26:14
def clean_span(adata):
  # to perform O(1) search for index 0 and 1
  # you can just have one dictionary if indexes don't matter
  d0 = dict()
  d1 = dict()

  r = []
  for a in adata:
    if a[0] in d0:
      print(str(a) + " overlaps with " + str(d0[a[0]]))
    elif a[1] in d1:
      print(str(a) + " overlaps with " + str(d1[a[1]]))
    else:  
      r.append(a)
      d0[a[0]] = a
      d1[a[1]] = a

  return r

保留2个词典:一个用于第一个元素,一个用于第二个元素。然后,在迭代数据时,请检查各个词典中是否存在任何键 - 如果找到键,则是重叠;否则不是。


代码中的问题:

if data[i][0] in data[i+1] or data[i][1] in data[i+1]:
    print(" {} overlaps with {}".format(data[i], data[i+1]))
    new_data.append(data[i+1])
else:
    new_data.append(data[i])

在if语句中,您将i + 1添加到new_data。因此,自然而然地,当循环增加到i + 1时,它将进入else,并且将重叠的元素添加回列表。


旁注:

if data[i][0] in data[i+1] or data[i][1] in data[i+1]:

您正在尝试在此处搜索整个列表中的元素。使您的时间复杂性o(nk)

def clean_span(adata):
  # to perform O(1) search for index 0 and 1
  # you can just have one dictionary if indexes don't matter
  d0 = dict()
  d1 = dict()

  r = []
  for a in adata:
    if a[0] in d0:
      print(str(a) + " overlaps with " + str(d0[a[0]]))
    elif a[1] in d1:
      print(str(a) + " overlaps with " + str(d1[a[1]]))
    else:  
      r.append(a)
      d0[a[0]] = a
      d1[a[1]] = a

  return r

Keep 2 dictionaries: one for first element and one for second. Then while iterating on the data, check if any of the keys exist in the respective dictionaries -- if the key is found, it is an overlap; else it is not.


Problem in your code:

if data[i][0] in data[i+1] or data[i][1] in data[i+1]:
    print(" {} overlaps with {}".format(data[i], data[i+1]))
    new_data.append(data[i+1])
else:
    new_data.append(data[i])

In the if statement, you add i + 1 to the new_data. So naturally when the loop increments to i + 1, it goes into the else and it adds the overlapped element back to the list.


Side note:

if data[i][0] in data[i+1] or data[i][1] in data[i+1]:

You are trying to search the element in the entire list here. Making your time complexity O(nk).

懒的傷心 2025-01-26 16:26:14

可以使用set.Intersection找到重叠。

import itertools as it

l = # list

# merge the first two entries of the sublist into a from-to set values
m = ((set(range(p[0], p[1]+1)), p[2]) for p in l)

# combine each element of the list to check overlapping
new_l = []
for p1, p2 in it.combinations(m, 2):
    s1, l1 = p1
    s2, l2 = p2

    if set.intersection(s1, s2):
        m1, M1 = min(s1), max(s1)
        m2, M2 = min(s2), max(s2)

        # choose the biggest one
        if M2-m2 > M1-m1:
            new_l.append((m2, M2, l2))
        else:
            new_l.append((m1, M1, l1))

print(sorted(new_l, key=lambda p: p[0]))

Overlapping can be found with set.intersection.

import itertools as it

l = # list

# merge the first two entries of the sublist into a from-to set values
m = ((set(range(p[0], p[1]+1)), p[2]) for p in l)

# combine each element of the list to check overlapping
new_l = []
for p1, p2 in it.combinations(m, 2):
    s1, l1 = p1
    s2, l2 = p2

    if set.intersection(s1, s2):
        m1, M1 = min(s1), max(s1)
        m2, M2 = min(s2), max(s2)

        # choose the biggest one
        if M2-m2 > M1-m1:
            new_l.append((m2, M2, l2))
        else:
            new_l.append((m1, M1, l1))

print(sorted(new_l, key=lambda p: p[0]))
薄荷梦 2025-01-26 16:26:14

我将您的列表转换为dict类型。键是foobar1,foobar2,...然后嵌套列表中的第一个项目是新dict的值。第二个循环和结果变量具有重复的值
rev_dict。第三个循环查找匹配项,然后从列表中删除项目。

l_ist = [[22, 37, 'foobar'], ....]
ini_dict = {}
rev_dict = {}
for i in range(len(lis)):
   ini_dict[l_ist[i][2]+str(i)] = l_ist[i][0] 
for key, value in ini_dict.items():
   rev_dict.setdefault(value, set()).add(key)

result = [key for key, values in rev_dict.items()if len(values) > 1] # duplicated     values in rev_dic

for i in result:
    for j in range(len(lis)) :
        if i == l_ist[j][0]:
            l_ist[j].remove(i)
print(l_ist)

I convert your list to dict type. The keys are foobar1,foobar2,... then first items in nested list are value of new dict. The second loop and in result variable has duplicated values in
rev_dict.The third loop find matches item then remove item from your list.

l_ist = [[22, 37, 'foobar'], ....]
ini_dict = {}
rev_dict = {}
for i in range(len(lis)):
   ini_dict[l_ist[i][2]+str(i)] = l_ist[i][0] 
for key, value in ini_dict.items():
   rev_dict.setdefault(value, set()).add(key)

result = [key for key, values in rev_dict.items()if len(values) > 1] # duplicated     values in rev_dic

for i in result:
    for j in range(len(lis)) :
        if i == l_ist[j][0]:
            l_ist[j].remove(i)
print(l_ist)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文