找出两个列表之间的 n 个最大差异

发布于 2025-01-06 15:04:08 字数 343 浏览 0 评论 0原文

我有两个列表 oldnew,具有相同数量的元素。

我正在尝试编写一个高效的函数,它以 n 作为参数,比较两个列表中相同位置的元素(按索引),找到 n 最大差异,并返回这些 n 元素的索引。

我认为这最好通过值排序字典来解决,但是 不可用< /a> 在 Python 中(我不知道有任何库提供它)。也许有更好的解决方案?

I have two lists old and new, with the same number of elements.

I'm trying to write an efficient function that takes n as a parameter, compares the elements of two lists at the same locations (by index), finds n largest differences, and returns the indices of those n elements.

I was thinking this would be best solved by a value-sorted dictionary, but one isn't available in Python (and I'm not aware of any libraries that offer it). Perhaps there's a better solution?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

童话 2025-01-13 15:04:08

每当您想到“n最大”时,请考虑heapq

>>> import heapq
>>> import random
>>> l1 = [random.randrange(100) for _ in range(100)]
>>> l2 = [random.randrange(100) for _ in range(100)]
>>> heapq.nlargest(10, (((a - b), a, b) for a, b in zip(l1, l2)))
[(78, 99, 21), (75, 86, 11), (69, 90, 21), (69, 70, 1), (60, 86, 26), (55, 95, 40), (52, 56, 4), (48, 98, 50), (46, 80, 34), (44, 81, 37)]

这将在 O(n log x) 时间内找到 x 最大的项目,其中 n 是列表中的项目总数;排序在 O(n log n) 时间内完成。

我只是想到上面的内容实际上并没有达到您的要求。你想要一个索引!还是很容易的。如果您想要差异的绝对值,我还将在此处使用 abs

>>> heapq.nlargest(10, xrange(len(l1)), key=lambda i: abs(l1[i] - l2[i]))
[91, 3, 14, 27, 46, 67, 59, 39, 65, 36]

Whenever you think "n largest", think heapq.

>>> import heapq
>>> import random
>>> l1 = [random.randrange(100) for _ in range(100)]
>>> l2 = [random.randrange(100) for _ in range(100)]
>>> heapq.nlargest(10, (((a - b), a, b) for a, b in zip(l1, l2)))
[(78, 99, 21), (75, 86, 11), (69, 90, 21), (69, 70, 1), (60, 86, 26), (55, 95, 40), (52, 56, 4), (48, 98, 50), (46, 80, 34), (44, 81, 37)]

This will find the x largest items in O(n log x) time, where n is the total number of items in the list; sorting does it in O(n log n) time.

It just occurred to me that the above doesn't actually do what you asked for. You want an index! Still very easy. I'll also use abs here in case you want the absolute value of the difference:

>>> heapq.nlargest(10, xrange(len(l1)), key=lambda i: abs(l1[i] - l2[i]))
[91, 3, 14, 27, 46, 67, 59, 39, 65, 36]
泛泛之交 2025-01-13 15:04:08

假设列表中的元素数量并不大,您可以对所有元素进行差异、排序并选择第一个 n

print sorted((abs(x-y) for x,y in zip(old, new)), reverse=True)[:n]

这将是 O(k log k)O(k log k) code> 其中 k 是原始列表的长度。

如果n明显小于k,最好的想法是使用nlargestheapq 模块提供的函数:

import heapq
print heapq.nlargest(n, (abs(x-y) for x,y in zip(old, new))

这​​将是 O(k log n)< /code> 而不是 O(k log k) 这对于 k >> 来说可能很重要。 n
另外,如果您的列表非常大,您可能最好使用 itertools.izip 而不是常规的 zip 函数。

Assuming the number of elements in the lists aren't huge, you could just difference all of them, sort, and pick the first n:

print sorted((abs(x-y) for x,y in zip(old, new)), reverse=True)[:n]

This would be O(k log k) where k is the length of your original lists.

If n is significantly smaller than k, the best idea would be to use the nlargest function provided by the heapq module:

import heapq
print heapq.nlargest(n, (abs(x-y) for x,y in zip(old, new))

This will be O(k log n) instead of O(k log k) which can be significant for k >> n.
Also, if your lists are really big, you'd probably be better off using itertools.izip instead of the regular zip function.

允世 2025-01-13 15:04:08

从你的问题来看,我认为这就是你想要的:

In Difference.py

l1 = [15,2,123,4,50]
l2 = [9,8,7,6,5]


l3 = zip(l1, l2)

def f(n):
    diff_val = 0
    index_val = 0
    l4 = l3[:n]

    for x,y in l4:
        if diff_val < abs(x-y):
            diff_val = abs(x-y)
            elem = (x, y)
            index_val = l3.index(elem)

    print "largest diff: ", diff_val
    print "index of values:", index_val


n = input("Enter value of n:") 

f(n)

执行:

[avasal@avasal ]# python difference.py 
Enter value of n:4
largest diff:  116
index of values: 2
[avasal@avasal]#

如果这不是你想要的,请考虑再详细阐述一下问题..

From your question i think this is what you want:

In difference.py

l1 = [15,2,123,4,50]
l2 = [9,8,7,6,5]


l3 = zip(l1, l2)

def f(n):
    diff_val = 0
    index_val = 0
    l4 = l3[:n]

    for x,y in l4:
        if diff_val < abs(x-y):
            diff_val = abs(x-y)
            elem = (x, y)
            index_val = l3.index(elem)

    print "largest diff: ", diff_val
    print "index of values:", index_val


n = input("Enter value of n:") 

f(n)

Execution:

[avasal@avasal ]# python difference.py 
Enter value of n:4
largest diff:  116
index of values: 2
[avasal@avasal]#

if this is not what you want, consider elaborating the question little more..

下雨或天晴 2025-01-13 15:04:08
>>> l = []
... for i in itertools.starmap(lambda x, y: abs(x-y), itertools.izip([1,2,3],   [100,102,330])):
...     l.append(i)
>>> l
5: [99, 100, 327]

itertools 对于重复性任务来说非常方便。从 starmaptuples 转换为 *args。供参考使用 max 函数您将能够获得所需的结果。 index函数将帮助找到位置。

l.index(max(l)

>>> l.index(max(l))
6: 2
>>> l = []
... for i in itertools.starmap(lambda x, y: abs(x-y), itertools.izip([1,2,3],   [100,102,330])):
...     l.append(i)
>>> l
5: [99, 100, 327]

itertools comes handy for repetitive tasks. From starmap converts tuples to *args. For reference. With max function you will be able to get the desired result. index function will help to find the position.

l.index(max(l)

>>> l.index(max(l))
6: 2
过气美图社 2025-01-13 15:04:08

这是在 numpy 中破解的解决方案(免责声明,我是 numpy 的新手,所以可能有更巧妙的方法这样做)。我没有组合任何步骤,因此每个步骤的作用非常清楚。最终值是原始列表的索引列表,按最高增量的顺序排列。选择前 n 个只是 sorted_inds[:n],从每个列表或增量列表中检索值也很简单。

我不知道它与其他解决方案的性能比较如何,并且显然不会在如此小的数据集上显示出来,但可能值得用您的真实数据集进行测试,因为我的理解是 numpy 非常非常快用于数值数组运算。

代码

import numpy

list1 = numpy.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
list2 = numpy.array([9, 8, 7, 6, 5, 4, 3, 2, 1])

#Caculate the delta between the two lists
delta = numpy.abs(numpy.subtract(list1, list2))
print('Delta: '.ljust(20) + str(delta))

#Get a list of the indexes of the sorted order delta
sorted_ind = numpy.argsort(delta)
print('Sorted indexes: '.ljust(20) + str(sorted_ind))

#reverse sort
sorted_ind = sorted_ind[::-1]
print('Reverse sort: '.ljust(20) + str(sorted_ind))

输出

Delta:              [8 6 4 2 0 2 4 6 8]
Sorted indexes:     [4 3 5 2 6 1 7 0 8]
Reverse sort:       [8 0 7 1 6 2 5 3 4]

Here's a solution hacked together in numpy (disclaimer, I'm a novice in numpy so there may be even slicker ways to do this). I didn't combine any of the steps so it is very clear what each step was doing. The final value is a list of the indexes of the original lists in order of the highest delta. Picking the top n is simply sorted_inds[:n] and retrieving the values from each list or from the delta list is trivial.

I don't know how it compares in performance to the other solutions and it's obviously not going to show up with such a small data set, but it might be worth testing with your real data set as my understanding is that numpy is very very fast for numerical array operations.

Code

import numpy

list1 = numpy.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
list2 = numpy.array([9, 8, 7, 6, 5, 4, 3, 2, 1])

#Caculate the delta between the two lists
delta = numpy.abs(numpy.subtract(list1, list2))
print('Delta: '.ljust(20) + str(delta))

#Get a list of the indexes of the sorted order delta
sorted_ind = numpy.argsort(delta)
print('Sorted indexes: '.ljust(20) + str(sorted_ind))

#reverse sort
sorted_ind = sorted_ind[::-1]
print('Reverse sort: '.ljust(20) + str(sorted_ind))

Output

Delta:              [8 6 4 2 0 2 4 6 8]
Sorted indexes:     [4 3 5 2 6 1 7 0 8]
Reverse sort:       [8 0 7 1 6 2 5 3 4]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文