如何对列表进行排序并获取前 K 个元素? (标准格式)
我有一个双打向量。我想将其从高到低排序,并获取前 K 个元素的索引。 std::sort 只是就地排序,并且不返回我认为的索引。获取最大元素的前 K 个索引的快速方法是什么?
I have a vector of doubles. I want to sort it from highest to lowest, and get the indices of the top K elements. std::sort just sorts in place, and does not return the indices I believe. What would be a quick way to get the top K indices of largest elements?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
您可以使用
nth_element
STL 算法 - 这将返回 N 个最大的元素(这是最快的方法,使用 stl ),然后对它们使用 .sort ,或者如果您希望对前 K 个元素进行排序,则可以使用partial_sort 算法(:仅使用 .sort 很糟糕- 对于你想要的目的来说,它非常很慢。.sort是很棒的STL算法,但是对于整个容器排序,而不仅仅是前K个元素(;这不是偶然的,nth_element和部分排序;)
you could use the
nth_element
STL algorithm - this will return you the N greatest elements ( this is the fastest way,using stl ) and then use .sort on them,or you could use the partial_sort algorithm,if you want the first K elements to be sorted (:Using just .sort is awful - it is very slow for the purpose you want.. .sort is great STL algorithm,but for sorting the whole container,not just the first K elements (; it's not an accident the existung of nth_element and partial_sort ;)
首先想到的事情有点黑客,但您可以定义一个存储双精度及其原始索引的结构,然后重载 <<运算符基于双精度进行排序:
然后您可以从结构中检索原始索引。
更完整的示例:
这将使它们从小到大排序,但您可以重载 >运算符,然后如果需要的话将更大的值传递给排序函数。
The first thing that comes to mind is somewhat hackish, but you could define a struct that stored both the double and its original index, then overload the < operator to sort based on the double:
Then you could retrieve the original indices from the struct.
Fuller example:
This will leave them sorted from smallest to largest, but you could overload the > operator instead and then pass in greater to the sort function if wanted.
好的,这个怎么样?
OK, how about this?
不确定预制算法,但请查看选择算法;如果您需要一组 N 个值的前 K 个元素,并且 N 远大于 K,则有更有效的方法。
如果您可以创建一个索引类(如@user470379的答案 - 基本上是一个将指针/索引封装到只读的“真实”数据的类),那么使用最大大小 K 的优先级队列,并添加每个未排序的将元素添加到优先级队列,当队列达到大小 K+1 时,弹出最底部的元素。在 N = 106、K = 100 等情况下,这种处理方式比完整排序更简单、更高效。
Not sure about pre-canned algorithms, but take a look at selection algorithms; if you need the top K elements of a set of N values and N is much larger than K, there are much more efficient methods.
If you can create an indexing class (like @user470379's answer -- basically a class that encapsulates a pointer/index to the "real" data which is read-only), then use a priority queue of maximum size K, and add each unsorted element to the priority queue, popping off the bottom-most element when the queue reaches size K+1. In cases like N = 106, K = 100, this handles cases much more simply + efficiently than a full sort.
所以你实际上需要一个将索引映射到相应双精度的结构。
您可以使用 std::multimap 类来执行此映射。正如 Jason 所指出的,
std::map
不允许重复的键。完成此操作后,您可以迭代前十个元素,因为映射保留了元素键的排序。
So you actually need a structure that maps indices to corresponding doubles.
You could use
std::multimap
class to perform this mapping. As Jason have notedstd::map
does not allow duplicate keys.After you've done this you could iterate over first ten elements as map preserves sorting of keys to the elements.
使用
multimap
作为向量
的(值,索引)来处理重复。使用反向迭代器按降序遍历结果。输出是
如果您只需要排序后的向量索引,请使用以下命令:
前 K 个条目按 0 到 K-1 进行索引,并按降序排列。这使用反向迭代器与标准
sort
相结合(使用less
在向前迭代时实现降序。等效地:优秀的
nth_element
的示例代码> @Kiril 在这里建议的解决方案(K = 125000,N = 500000)。我想尝试一下,所以就在这里。Use
multimap
forvector
's (value, index) to handle dups. Use reverse iterators to walk results in descending order.Output is
If you just want the
vector
indices after sort, use this:The top K entries are indexed by 0 to K-1, and appear in descending order. This uses reverse iterators combined with standard
sort
(usingless<double>
to achieve descending order when iterated forward. Equivalently:Sample code for the excellent
nth_element
solution suggested by @Kiril here (K = 125000, N = 500000). I wanted to try this out, so here it is.