找出两个列表之间的 n 个最大差异
我有两个列表 old
和 new
,具有相同数量的元素。
我正在尝试编写一个高效的函数,它以 n
作为参数,比较两个列表中相同位置的元素(按索引),找到 n
最大差异,并返回这些 n
元素的索引。
我认为这最好通过值排序字典来解决,但是 不可用< /a> 在 Python 中(我不知道有任何库提供它)。也许有更好的解决方案?
I have two lists old
and new
, with the same number of elements.
I'm trying to write an efficient function that takes n
as a parameter, compares the elements of two lists at the same locations (by index), finds n
largest differences, and returns the indices of those n
elements.
I was thinking this would be best solved by a value-sorted dictionary, but one isn't available in Python (and I'm not aware of any libraries that offer it). Perhaps there's a better solution?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
每当您想到“n最大”时,请考虑
heapq
。这将在 O(n log x) 时间内找到 x 最大的项目,其中 n 是列表中的项目总数;排序在 O(n log n) 时间内完成。
我只是想到上面的内容实际上并没有达到您的要求。你想要一个索引!还是很容易的。如果您想要差异的绝对值,我还将在此处使用
abs
:Whenever you think "n largest", think
heapq
.This will find the x largest items in O(n log x) time, where n is the total number of items in the list; sorting does it in O(n log n) time.
It just occurred to me that the above doesn't actually do what you asked for. You want an index! Still very easy. I'll also use
abs
here in case you want the absolute value of the difference:假设列表中的元素数量并不大,您可以对所有元素进行差异、排序并选择第一个
n
:这将是
O(k log k)
O(k log k) code> 其中k
是原始列表的长度。如果
n
明显小于k
,最好的想法是使用nlargest
由heapq
模块提供的函数:这将是
O(k log n)< /code> 而不是
O(k log k)
这对于k >> 来说可能很重要。 n
。另外,如果您的列表非常大,您可能最好使用
itertools.izip
而不是常规的zip
函数。Assuming the number of elements in the lists aren't huge, you could just difference all of them, sort, and pick the first
n
:This would be
O(k log k)
wherek
is the length of your original lists.If
n
is significantly smaller thank
, the best idea would be to use thenlargest
function provided by theheapq
module:This will be
O(k log n)
instead ofO(k log k)
which can be significant fork >> n
.Also, if your lists are really big, you'd probably be better off using
itertools.izip
instead of the regularzip
function.从你的问题来看,我认为这就是你想要的:
In Difference.py
执行:
如果这不是你想要的,请考虑再详细阐述一下问题..
From your question i think this is what you want:
In difference.py
Execution:
if this is not what you want, consider elaborating the question little more..
itertools
对于重复性任务来说非常方便。从starmap
将tuples
转换为*args
。供参考。使用 max
函数您将能够获得所需的结果。index
函数将帮助找到位置。l.index(max(l)
itertools
comes handy for repetitive tasks. Fromstarmap
convertstuples
to*args
. For reference.With max
function you will be able to get the desired result.index
function will help to find the position.l.index(max(l)
这是在 numpy 中破解的解决方案(免责声明,我是 numpy 的新手,所以可能有更巧妙的方法这样做)。我没有组合任何步骤,因此每个步骤的作用非常清楚。最终值是原始列表的索引列表,按最高增量的顺序排列。选择前 n 个只是
sorted_inds[:n]
,从每个列表或增量列表中检索值也很简单。我不知道它与其他解决方案的性能比较如何,并且显然不会在如此小的数据集上显示出来,但可能值得用您的真实数据集进行测试,因为我的理解是 numpy 非常非常快用于数值数组运算。
代码
输出
Here's a solution hacked together in numpy (disclaimer, I'm a novice in numpy so there may be even slicker ways to do this). I didn't combine any of the steps so it is very clear what each step was doing. The final value is a list of the indexes of the original lists in order of the highest delta. Picking the top n is simply
sorted_inds[:n]
and retrieving the values from each list or from the delta list is trivial.I don't know how it compares in performance to the other solutions and it's obviously not going to show up with such a small data set, but it might be worth testing with your real data set as my understanding is that numpy is very very fast for numerical array operations.
Code
Output