有效的算法以检查列表中的值是否在列表中并重述元素的索引

发布于 2025-01-23 20:58:29 字数 747 浏览 0 评论 0原文

我的目标是有效地在大量列表中查找(让我们以示例为1 mln条目,每个条目是由3个元素组成的列表),该元素的索引包含一定值:

例如,让我们以列表a

a = [[0,1,2],[0,5,6],[7,8,9]]

我想检验包含值0的元素的索引,因此我的函数将返回0,1

我的第一次尝试是:

def any_identical_value(elements,index):

    for el in elements:

        if el == index:

            return True

    return False


def get_dual_points(compliant_cells, index ):
      compliant = [i for i,e in enumerate(compliant_cells) if any_identical_value(e,index)]
      return compliant


result = get_dual_points(a,0)

该解决方案正常工作,但对于大量列表列表的效率高度低。特别是我的目标是执行主要列表中值总数的疑问,因此n_queries = len(a)*3在上述9中

。 :

  • 列表是完成此任务的好数据结构吗?
  • 是否有更有效的算法解决方案?

My goal is to efficiently find in a large list of list (let's take as an example 1 mln of entries and each entry is a list composed of 3 elements) the index of the element containing a certain value:

e.g let's take the list a

a = [[0,1,2],[0,5,6],[7,8,9]]

i want to retrive the indices of the elements containing the value 0, hence my function would return 0,1

My first try has been the following:

def any_identical_value(elements,index):

    for el in elements:

        if el == index:

            return True

    return False


def get_dual_points(compliant_cells, index ):
      compliant = [i for i,e in enumerate(compliant_cells) if any_identical_value(e,index)]
      return compliant


result = get_dual_points(a,0)

The solution works correctly but it is highly inefficient for large list of lists. In particular my goal is to perform a number of quesries that is the total number of values in the primary list, hence n_queries = len(a)*3, in the example above 9.

Here comes 2 questions:

  • Is the list the good data structure to achieve this task?
  • Is there a more efficient algorithm solution?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

2025-01-30 20:58:29

您可以一次使用所有索引(单个o(n)通过),这将使您可以在o(1)时间内回答查询。

from collections import defaultdict

d = defaultdict(list)
a = [[0,1,2],[0,5,6],[7,8,9]]
queries = [0,1]
for i in range(len(a)):
    for element in a[i]:
        d[element].append(i)

for x in queries:
    print(d[x])

# prints
# [0, 1]
# [0]

You can hash all indexes in one go (single O(N) pass), which would allow you to answer the queries in O(1) time.

from collections import defaultdict

d = defaultdict(list)
a = [[0,1,2],[0,5,6],[7,8,9]]
queries = [0,1]
for i in range(len(a)):
    for element in a[i]:
        d[element].append(i)

for x in queries:
    print(d[x])

# prints
# [0, 1]
# [0]
与酒说心事 2025-01-30 20:58:29

这是一种提出的算法:曾经在列表中迭代,以构建一个地图每个独特的元素 all 它属于的sublists的索引。

通过这种方法,dict构建需要时间与列表列表中的元素总数成比例。然后每个查询都是恒定的。

这需要列表的命令:

def dict_of_indices(a):
    d = {}
    for i,l in enumerate(a):
        for e in l:
            d.setdefault(e, []).append(i)
    return d

a = [[0,1,2],[0,5,6],[7,8,9]]
d = dict_of_indices(a)
print( d[0] )
# [0, 1]

Here is a proposed algorithm: iterate on the list of lists once, to build a dict that maps every unique element to all the indices of the sublists it belongs to.

With this method, the dict-building takes time proportional to the total number of elements in the list of lists. Then every query is constant-time.

This requires a dict of lists:

def dict_of_indices(a):
    d = {}
    for i,l in enumerate(a):
        for e in l:
            d.setdefault(e, []).append(i)
    return d

a = [[0,1,2],[0,5,6],[7,8,9]]
d = dict_of_indices(a)
print( d[0] )
# [0, 1]
忘年祭陌 2025-01-30 20:58:29

您可以创建一个词典,该字典从一个值映射到一组行索引。然后,对于每个查询,您可以简单地查找该值,如果它在2D列表中的任何地方都不存在,则返回一个空集

from itertools import product

a = [[0,1,2],[0,5,6],[7,8,9]]

values = {}

for row, col in product(range(len(a)), range(len(a[0]))):
    value_at_index = a[row][col]
    values.setdefault(value_at_index, set()).add(row)
    
print(values.get(0, set()))

{0, 1}

如果您提前知道每个子列表中的每个元素都是唯一的,那么您可以将字典更新行更改为:

values.setdefault(value_at_index, []).append(row)

并将.get()调用更改为:

values.get(0, [])

要维护输出中索引的排序。

You can create a dictionary that maps from a value to a set of row indices. Then, for each query, you can simply look up the value, returning an empty set if it doesn't exist anywhere in the 2D list:

from itertools import product

a = [[0,1,2],[0,5,6],[7,8,9]]

values = {}

for row, col in product(range(len(a)), range(len(a[0]))):
    value_at_index = a[row][col]
    values.setdefault(value_at_index, set()).add(row)
    
print(values.get(0, set()))

This outputs:

{0, 1}

If you know in advance that the elements within each sublist are unique, then you can change the dictionary update line to:

values.setdefault(value_at_index, []).append(row)

and change the .get() call to:

values.get(0, [])

to maintain the ordering of the indices in the output.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文