如何获取 NumPy 数组中 N 个最大值的索引?
NumPy 提出了一种通过 < 获取数组最大值索引的方法代码>np.argmax。
我想要类似的东西,但返回 N
最大值的索引。
例如,如果我有一个数组 [1, 3, 2, 4, 5]
,那么 nargmax(array, n=3)
将返回索引 [4, 3, 1]
对应于元素 [5, 4, 3]
。
NumPy proposes a way to get the index of the maximum value of an array via np.argmax
.
I would like a similar thing, but returning the indexes of the N
maximum values.
For instance, if I have an array, [1, 3, 2, 4, 5]
, then nargmax(array, n=3)
would return the indices [4, 3, 1]
which correspond to the elements [5, 4, 3]
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(21)
如果您正在处理 NaN 和/或在理解 np.argpartition 时遇到问题,请尝试 pandas.DataFrame.sort_values。
此示例给出了 3 个最大的非 NaN 值的索引。可能效率低下,但易于阅读和定制。
If you are dealing with NaNs and/or have problems understanding np.argpartition, try pandas.DataFrame.sort_values.
This example gives the indices of the 3 largest, not-NaN values. Probably inefficient, but easy to read and customize.
较新的 NumPy 版本(1.8 及更高版本)有一个名为
的函数argpartition
为此。要获取四个最大元素的索引,请执行与
不同的操作argsort
,该函数在最坏情况下以线性时间运行,但返回的索引未排序,从计算a[ind]
的结果可以看出。如果您也需要,请随后对它们进行排序:要以这种方式按排序顺序获取前k个元素,需要 O(n + k log k) 时间。
Newer NumPy versions (1.8 and up) have a function called
argpartition
for this. To get the indices of the four largest elements, doUnlike
argsort
, this function runs in linear time in the worst case, but the returned indices are not sorted, as can be seen from the result of evaluatinga[ind]
. If you need that too, sort them afterwards:To get the top-k elements in sorted order in this way takes O(n + k log k) time.
我能想到的最简单的是:
这涉及数组的完整排序。我想知道 numpy 是否提供了一种内置的方法来进行部分排序;到目前为止我还没找到。
如果这个解决方案被证明太慢(特别是对于小的
n
),那么可能值得考虑在Cython。The simplest I've been able to come up with is:
This involves a complete sort of the array. I wonder if
numpy
provides a built-in way to do a partial sort; so far I haven't been able to find one.If this solution turns out to be too slow (especially for small
n
), it may be worth looking at coding something up in Cython.更简单:
其中 n 是最大值的数量。
Simpler yet:
where n is the number of maximum values.
使用:
对于常规 Python 列表:
如果您使用 Python 2,请使用
xrange
而不是range
。来源:heapq — 堆队列算法
Use:
For regular Python lists:
If you use Python 2, use
xrange
instead ofrange
.Source: heapq — Heap queue algorithm
如果您碰巧正在使用多维数组,那么您需要展平并解开索引:
例如:
If you happen to be working with a multidimensional array then you'll need to flatten and unravel the indices:
For example:
编码简易性和速度的三个答案比较
速度对于我的需求很重要,所以我测试了这个问题的三个答案。
这三个答案中的代码根据我的具体案例的需要进行了修改。
然后我比较了每种方法的速度。
编码方面:
测试和比较输出的完整代码
带有速度报告的
NPE 的答案:
Fred Foo 的答案:
off99555 的答案:
Three Answers Compared For Coding Ease And Speed
Speed was important for my needs, so I tested three answers to this question.
Code from those three answers was modified as needed for my specific case.
I then compared the speed of each method.
Coding wise:
Complete Code for Test and Comparisons
Output with Speed Reports
NPE's Answer:
Fred Foo's Answer:
off99555's Answer:
如果您不关心第 K 个最大元素的顺序,可以使用
argpartition
,它的性能应该比通过argsort
进行完全排序要好。积分请转到这个问题。
我运行了一些测试,随着数组大小和 K 值的增加,
argpartition
看起来优于argsort
。If you don't care about the order of the K-th largest elements you can use
argpartition
, which should perform better than a full sort throughargsort
.Credits go to this question.
I ran a few tests and it looks like
argpartition
outperformsargsort
as the size of the array and the value of K increase.对于多维数组,您可以使用 axis 关键字来沿预期轴应用分区。
对于抓取项目:
但请注意,这不会返回排序结果。在这种情况下,您可以沿预期轴使用
np.argsort()
:以下是一个示例:
For multidimensional arrays you can use the
axis
keyword in order to apply the partitioning along the expected axis.And for grabbing the items:
But note that this won't return a sorted result. In that case you can use
np.argsort()
along the intended axis:Here is an example:
方法np.argpartition仅返回k个最大索引,执行本地排序,并且当数组很大时比np.argsort(执行完整排序)更快。但返回的索引不按升序/降序排列。举个例子:
我们可以看到,如果您想要严格升序排列前 k 个索引,
np.argpartition
将不会返回您想要的内容。除了在 np.argpartition 之后手动进行排序之外,我的解决方案是使用 PyTorch,
torch.topk
,神经网络构建工具,提供类似 NumPy 的 API,同时支持 CPU 和 GPU。它的速度与带有 MKL 的 NumPy 一样快,如果您需要大型矩阵/向量计算,它还可以提供 GPU 提升。严格的上升/下降前 k 索引代码将是:
请注意
torch.topk
接受火炬张量,并以torch.Tensor
类型返回前 k 个值和前 k 个索引。与 np 类似,torch.topk 也接受 axis 参数,以便您可以处理多维数组/张量。Method
np.argpartition
only returns the k largest indices, performs a local sort, and is faster thannp.argsort
(performing a full sort) when array is quite large. But the returned indices are NOT in ascending/descending order. Let's say with an example:We can see that if you want a strict ascending order top k indices,
np.argpartition
won't return what you want.Apart from doing a sort manually after np.argpartition, my solution is to use PyTorch,
torch.topk
, a tool for neural network construction, providing NumPy-like APIs with both CPU and GPU support. It's as fast as NumPy with MKL, and offers a GPU boost if you need large matrix/vector calculations.Strict ascend/descend top k indices code will be:
Note that
torch.topk
accepts a torch tensor, and returns both top k values and top k indices in typetorch.Tensor
. Similar with np, torch.topk also accepts an axis argument so that you can handle multi-dimensional arrays/tensors.这将比完全排序更快,具体取决于原始数组的大小和选择的大小:
当然,它涉及篡改原始数组。您可以通过制作副本或替换回原始值来修复(如果需要)。 ...以您的用例而言更便宜的为准。
This will be faster than a full sort depending on the size of your original array and the size of your selection:
It, of course, involves tampering with your original array. Which you could fix (if needed) by making a copy or replacing back the original values. ...whichever is cheaper for your use case.
使用:
现在
结果
列表将包含N个元组(索引
,值
),其中值 最大化。
Use:
Now the
result
list would contain N tuples (index
,value
) wherevalue
is maximized.用途:
它也适用于二维数组。例如,
Use:
It also works with 2D arrays. For example,
使用 argpartition 的矢量化 2D 实现:
A vectorized 2D implementation using argpartition:
我发现使用 np.unique 最直观。
这个想法是,unique 方法返回输入值的索引。然后根据最大唯一值和索引,可以重新创建原始值的位置。
I found it most intuitive to use
np.unique
.The idea is, that the unique method returns the indices of the input values. Then from the max unique value and the indicies, the position of the original values can be recreated.
以下是查看最大元素及其位置的非常简单的方法。这里
axis
是域;对于 2D 情况,axis
= 0 表示按列最大数量,axis
= 1 表示按行最大数量。对于更高的维度,这取决于你。The following is a very easy way to see the maximum elements and its positions. Here
axis
is the domain;axis
= 0 means column wise maximum number andaxis
= 1 means row wise max number for the 2D case. And for higher dimensions it depends upon you.这是一种更复杂的方法,如果第 n 个值有联系,则增加 n:
Here's a more complicated way that increases n if the nth value has ties:
我认为最有效的方法是手动迭代数组并保留 k 大小的最小堆,正如其他人提到的那样。
我还想出了一个蛮力方法:
在使用 argmax 获取其索引后,将最大元素设置为一个大的负值。然后下一次调用 argmax 将返回第二大元素。
如果需要,您可以记录这些元素的原始值并恢复它们。
I think the most time efficiency way is manually iterate through the array and keep a k-size min-heap, as other people have mentioned.
And I also come up with a brute force approach:
Set the largest element to a large negative value after you use argmax to get its index. And then the next call of argmax will return the second largest element.
And you can log the original value of these elements and recover them if you want.
此代码适用于 numpy 2D 矩阵 数组:
这会生成一个 true-false n_largest 矩阵索引,也可用于从矩阵数组中提取 n_largest 元素
This code works for a numpy 2D matrix array:
This produces a true-false n_largest matrix indexing that also works to extract n_largest elements from a matrix array
当top_k<
When top_k<<axis_length,it better than argsort.
您可以简单地使用字典来查找前 k 个值和numpy 数组中的索引。
例如,如果您想查找前 2 个最大值 &指数
You can simply use a dictionary to find top k values & indices in a numpy array.
For example, if you want to find top 2 maximum values & indices