如何在具有递归函数的列表中将重复连续值的重复值列表分组？

发布于 2025-01-29 20:13:51 字数 3028 浏览 0 评论 0原文

我想将连续值分组，如果它重复，每个值仅在一个组中，让我们在下面查看我的示例：

note ：result > test_list

test_list = ["1","2","1","2","1","1","5325235","2","62623","1","1"]
--->results = [[[0, 1], [2, 3]],
               [[4, 5], [9, 10]]]

test_list = ["1","2","1","1","2","1","5325235","2","62623","1","2","1","236","2388","626236437","1","2","1","236","2388"]
--->results = [[[9, 10, 11, 12, 13], [15, 16, 17, 18, 19]],
               [[0, 1, 2], [3, 4, 5]]]

i构建递归功能：

def group_duplicate_continuous_value(list_label_group):

    # how to know which continuous value is duplicate, I implement take next number minus the previous number
    list_flag_grouping = [str(int(j.split("_")[0]) - int(i.split("_")[0])) +f"_{j}_{i}" for i,j in zip(list_label_group,list_label_group[1:])]

    # I find duplicate value in list_flag_grouping
    counter_elements = Counter(list_flag_grouping)
    list_have_duplicate = [k for k,v in counter_elements.items() if v > 1]

    if len(list_have_duplicate) > 0:
        list_final_index = group_duplicate_continuous_value(list_flag_grouping)
        # To return exactly value, I use index to define
        for k, v in list_final_index.items():
            temp_list = [v[i] + [v[i][-1] + 1] for i in range(0,len(v))]
            list_final_index[k] = temp_list
        check_first_cursive = list_label_group[0].split("_")
    
        # If we have many list grouping duplicate countinous value with different length, we need function below to return exactly results
        if len(check_first_cursive) > 1:
            list_temp_index = find_index_duplicate(list_label_group)
            list_duplicate_index = list_final_index.values()
            list_duplicate_index = [val for sublist in list_duplicate_index for val1 in sublist for val in val1]
            for k,v in list_temp_index.items():
                list_index_v = [val for sublist in v for val in sublist]
                if any(x in list_index_v for x in list_duplicate_index) is False:
                    list_final_index[k] = v
        
                    
        return list_final_index
    else:
        if len(list_label_group) > 0:
            check_first_cursive = list_label_group[0].split("_")
            if len(check_first_cursive) > 1:
                list_final_index = find_index_duplicate(list_label_group)
                return list_final_index
        list_final_index = None
        return list_final_index

支持功能：

def find_index_duplicate(list_data):
    dups = defaultdict(list)
    for i, e in enumerate(list_data):
        dups[e].append([i])
    new_dict = {key:val for key, val in dups.items() if len(val) >1}
    return new_dict

但是当我使用test_list = [5,5,5,5,5,5,5,5,5,5,5,5,5,5,5， 5,5,5,5,5,5,5,5,5,1,2,5,5,5,5,5,5,5,5,5,5,5,5,5,5， 5,5,5,5,5,5,5,5,5,5,5,5,5,5,5]，它非常慢，使使使（〜6GB）。我知道一个原因是我的递归功能的堆栈溢出group_duplate_continuun_value，但我不知道如何修复它。

原文

I want to group consecutive values if it's duplicates and each value is just in one group, let's see my example below:

Note: results is an index of the value in test_list

test_list = ["1","2","1","2","1","1","5325235","2","62623","1","1"]
--->results = [[[0, 1], [2, 3]],
               [[4, 5], [9, 10]]]

test_list = ["1","2","1","1","2","1","5325235","2","62623","1","2","1","236","2388","626236437","1","2","1","236","2388"]
--->results = [[[9, 10, 11, 12, 13], [15, 16, 17, 18, 19]],
               [[0, 1, 2], [3, 4, 5]]]

I build a recursive function:

def group_duplicate_continuous_value(list_label_group):

    # how to know which continuous value is duplicate, I implement take next number minus the previous number
    list_flag_grouping = [str(int(j.split("_")[0]) - int(i.split("_")[0])) +f"_{j}_{i}" for i,j in zip(list_label_group,list_label_group[1:])]

    # I find duplicate value in list_flag_grouping
    counter_elements = Counter(list_flag_grouping)
    list_have_duplicate = [k for k,v in counter_elements.items() if v > 1]

    if len(list_have_duplicate) > 0:
        list_final_index = group_duplicate_continuous_value(list_flag_grouping)
        # To return exactly value, I use index to define
        for k, v in list_final_index.items():
            temp_list = [v[i] + [v[i][-1] + 1] for i in range(0,len(v))]
            list_final_index[k] = temp_list
        check_first_cursive = list_label_group[0].split("_")
    
        # If we have many list grouping duplicate countinous value with different length, we need function below to return exactly results
        if len(check_first_cursive) > 1:
            list_temp_index = find_index_duplicate(list_label_group)
            list_duplicate_index = list_final_index.values()
            list_duplicate_index = [val for sublist in list_duplicate_index for val1 in sublist for val in val1]
            for k,v in list_temp_index.items():
                list_index_v = [val for sublist in v for val in sublist]
                if any(x in list_index_v for x in list_duplicate_index) is False:
                    list_final_index[k] = v
        
                    
        return list_final_index
    else:
        if len(list_label_group) > 0:
            check_first_cursive = list_label_group[0].split("_")
            if len(check_first_cursive) > 1:
                list_final_index = find_index_duplicate(list_label_group)
                return list_final_index
        list_final_index = None
        return list_final_index

Support function:

def find_index_duplicate(list_data):
    dups = defaultdict(list)
    for i, e in enumerate(list_data):
        dups[e].append([i])
    new_dict = {key:val for key, val in dups.items() if len(val) >1}
    return new_dict

But when I run with test_list = [5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,1,2,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,1,2,5,5,5], it's very slow and make out of memory (~6GB). I knew a reason is stack overflow of my recursive function group_duplicate_continuous_value but I don't know how to fix it.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

樱娆 2025-02-05 20:13:51

您可以创建一个列表的命令，其中原始列表中的每个项目都是dict中的键，每个键都映射到其原始列表中的索引列表。例如，您的列表[“ 1”，“ 3”，“ 5”，“ 5”，“ 7”，“ 1”，“ 3”，“ 5”]将导致dict {“ 1”：[0，5]，“ 3”：[1，6]，“ 5”：[2，3，7]，“ 7”：[4]}。

以这种方式创建列表的命令在Python中也非常惯用，而且快速：可以通过在列表中迭代一次来完成。

def build_dict(l):
    d = {}
    for i, x in enumerate(l):
        d.setdefault(x, []).append(i)
    return d

l = ["1","3","5","5","7","1","3","5"]
d = build_dict(l)
print(d)
# {'1': [0, 5], '3': [1, 6], '5': [2, 3, 7], '7': [4]}

然后，您可以在dict上迭代构建两个索引列表：

def build_index_results(l):
    d = build_dict(l)
    idx1, idx2 = [], []
    for v in d.values():
        if len(v) > 1:
            idx1.append(v[0])
            idx2.append(v[1])
    return idx1, idx2

print(build_index_results(l))
# ([0, 1, 2], [5, 6, 3])

或使用zip：

from operator import itemgetter

def build_index_results(l):
    d = build_dict(l)
    return list(zip(*map(itemgetter(0,1), (v for v in d.values() if len(v) > 1))))

print(build_index_results(l))
# [(0, 1, 2), (5, 6, 3)]

我无法抗拒showcasing more_itertools.map_reduce。

from more_itertools import map_reduce
from operator import itemgetter

def build_index_results(l):
    d = map_reduce(enumerate(l),
                   keyfunc=itemgetter(1),
                   valuefunc=itemgetter(0),
                   reducefunc=lambda v: v[:2] if len(v) > 1 else None
    )
    return list(zip(*filter(None, d.values())))

print(build_index_results(l))
# [(0, 1, 2), (5, 6, 3)]

You can create a dict of lists, where every item from the original list is a key in the dict, and every key is mapped to the list of its indices in the original list. For instance, your list ["1","3","5","5","7","1","3","5"] would result in the dict {"1": [0, 5], "3": [1, 6], "5": [2, 3, 7], "7": [4]}.

Creating a dict of lists in this way is very idiomatic in python, and fast, too: it can be done by iterating just once on the list.

def build_dict(l):
    d = {}
    for i, x in enumerate(l):
        d.setdefault(x, []).append(i)
    return d

l = ["1","3","5","5","7","1","3","5"]
d = build_dict(l)
print(d)
# {'1': [0, 5], '3': [1, 6], '5': [2, 3, 7], '7': [4]}

Then you can iterate on the dict to build two lists of indices:

def build_index_results(l):
    d = build_dict(l)
    idx1, idx2 = [], []
    for v in d.values():
        if len(v) > 1:
            idx1.append(v[0])
            idx2.append(v[1])
    return idx1, idx2

print(build_index_results(l))
# ([0, 1, 2], [5, 6, 3])

Or using zip:

from operator import itemgetter

def build_index_results(l):
    d = build_dict(l)
    return list(zip(*map(itemgetter(0,1), (v for v in d.values() if len(v) > 1))))

print(build_index_results(l))
# [(0, 1, 2), (5, 6, 3)]

I can't resist showcasing more_itertools.map_reduce for this:

from more_itertools import map_reduce
from operator import itemgetter

def build_index_results(l):
    d = map_reduce(enumerate(l),
                   keyfunc=itemgetter(1),
                   valuefunc=itemgetter(0),
                   reducefunc=lambda v: v[:2] if len(v) > 1 else None
    )
    return list(zip(*filter(None, d.values())))

print(build_index_results(l))
# [(0, 1, 2), (5, 6, 3)]

回复收藏 0 原文

~没有更多了~