有效的算法以检查列表中的值是否在列表中并重述元素的索引
我的目标是有效地在大量列表中查找(让我们以示例为1 mln条目,每个条目是由3个元素组成的列表),该元素的索引包含一定值:
例如,让我们以列表a
a = [[0,1,2],[0,5,6],[7,8,9]]
我想检验包含值0的元素的索引,因此我的函数将返回0,1
我的第一次尝试是:
def any_identical_value(elements,index):
for el in elements:
if el == index:
return True
return False
def get_dual_points(compliant_cells, index ):
compliant = [i for i,e in enumerate(compliant_cells) if any_identical_value(e,index)]
return compliant
result = get_dual_points(a,0)
该解决方案正常工作,但对于大量列表列表的效率高度低。特别是我的目标是执行主要列表中值总数的疑问,因此n_queries = len(a)*3
在上述9中
。 :
- 列表是完成此任务的好数据结构吗?
- 是否有更有效的算法解决方案?
My goal is to efficiently find in a large list of list (let's take as an example 1 mln of entries and each entry is a list composed of 3 elements) the index of the element containing a certain value:
e.g let's take the list a
a = [[0,1,2],[0,5,6],[7,8,9]]
i want to retrive the indices of the elements containing the value 0, hence my function would return 0,1
My first try has been the following:
def any_identical_value(elements,index):
for el in elements:
if el == index:
return True
return False
def get_dual_points(compliant_cells, index ):
compliant = [i for i,e in enumerate(compliant_cells) if any_identical_value(e,index)]
return compliant
result = get_dual_points(a,0)
The solution works correctly but it is highly inefficient for large list of lists. In particular my goal is to perform a number of quesries that is the total number of values in the primary list, hence n_queries = len(a)*3
, in the example above 9.
Here comes 2 questions:
- Is the list the good data structure to achieve this task?
- Is there a more efficient algorithm solution?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以一次使用所有索引(单个
o(n)
通过),这将使您可以在o(1)
时间内回答查询。You can hash all indexes in one go (single
O(N)
pass), which would allow you to answer the queries inO(1)
time.这是一种提出的算法:曾经在列表中迭代,以构建一个地图每个独特的元素 all 它属于的sublists的索引。
通过这种方法,dict构建需要时间与列表列表中的元素总数成比例。然后每个查询都是恒定的。
这需要列表的命令:
Here is a proposed algorithm: iterate on the list of lists once, to build a dict that maps every unique element to all the indices of the sublists it belongs to.
With this method, the dict-building takes time proportional to the total number of elements in the list of lists. Then every query is constant-time.
This requires a dict of lists:
您可以创建一个词典,该字典从一个值映射到一组行索引。然后,对于每个查询,您可以简单地查找该值,如果它在2D列表中的任何地方都不存在,则返回一个空集
:
如果您提前知道每个子列表中的每个元素都是唯一的,那么您可以将字典更新行更改为:
并将
.get()
调用更改为:要维护输出中索引的排序。
You can create a dictionary that maps from a value to a set of row indices. Then, for each query, you can simply look up the value, returning an empty set if it doesn't exist anywhere in the 2D list:
This outputs:
If you know in advance that the elements within each sublist are unique, then you can change the dictionary update line to:
and change the
.get()
call to:to maintain the ordering of the indices in the output.