如何对列表进行排序并获取前 K 个元素? (标准格式)

发布于 2024-09-28 01:17:04 字数 94 浏览 1 评论 0原文

我有一个双打向量。我想将其从高到低排序,并获取前 K 个元素的索引。 std::sort 只是就地排序,并且不返回我认为的索引。获取最大元素的前 K 个索引的快速方法是什么?

I have a vector of doubles. I want to sort it from highest to lowest, and get the indices of the top K elements. std::sort just sorts in place, and does not return the indices I believe. What would be a quick way to get the top K indices of largest elements?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

野味少女 2024-10-05 01:17:04

您可以使用 nth_element STL 算法 - 这将返回 N 个最大的元素(这是最快的方法,使用 stl ),然后对它们使用 .sort ,或者如果您希望对前 K 个元素进行排序,则可以使用partial_sort 算法(:

仅使用 .sort 很糟糕- 对于你想要的目的来说,它非常很慢。.sort是很棒的STL算法,但是对于整个容器排序,而不仅仅是前K个元素(;这不是偶然的,nth_element和部分排序;)

you could use the nth_element STL algorithm - this will return you the N greatest elements ( this is the fastest way,using stl ) and then use .sort on them,or you could use the partial_sort algorithm,if you want the first K elements to be sorted (:

Using just .sort is awful - it is very slow for the purpose you want.. .sort is great STL algorithm,but for sorting the whole container,not just the first K elements (; it's not an accident the existung of nth_element and partial_sort ;)

一城柳絮吹成雪 2024-10-05 01:17:04

首先想到的事情有点黑客,但您可以定义一个存储双精度及其原始索引的结构,然后重载 <<运算符基于双精度进行排序:

struct s {
    double d;
    int index;
    bool operator < (const struct &s) const {
        return d < s.d;
    }
};

然后您可以从结构中检索原始索引。

更完整的示例:

vector<double> orig;
vector<s> v;
...
for (int i=0; i < orig.size(); ++i) {
    s s_temp;
    s_temp.d = orig[i];
    s_temp.index = i;
    v.push_back(s);
}
sort(v.begin(), v.end());
//now just retrieve v[i].index

这将使它们从小到大排序,但您可以重载 >运算符,然后如果需要的话将更大的值传递给排序函数。

The first thing that comes to mind is somewhat hackish, but you could define a struct that stored both the double and its original index, then overload the < operator to sort based on the double:

struct s {
    double d;
    int index;
    bool operator < (const struct &s) const {
        return d < s.d;
    }
};

Then you could retrieve the original indices from the struct.

Fuller example:

vector<double> orig;
vector<s> v;
...
for (int i=0; i < orig.size(); ++i) {
    s s_temp;
    s_temp.d = orig[i];
    s_temp.index = i;
    v.push_back(s);
}
sort(v.begin(), v.end());
//now just retrieve v[i].index

This will leave them sorted from smallest to largest, but you could overload the > operator instead and then pass in greater to the sort function if wanted.

穿越时光隧道 2024-10-05 01:17:04

好的,这个怎么样?

bool isSmaller (std::pair<double, int> x, std::pair<double, int> y)
{
   return x.first< y.first;
}

int main()
{
   //...
   //you have your vector<double> here, say name is d;
   std::vector<std::pair<double, int> > newVec(d.size());
   for(int i = 0; i < newVec.size(); ++i)
   {
      newVec[i].first = d[i];
      newVec[i].second = i;  //store the initial index
   }
   std::sort(newVec.begin(), newVec.end(), &isSmaller);
   //now you can iterate through first k elements and the second components will be the initial indices
}

OK, how about this?

bool isSmaller (std::pair<double, int> x, std::pair<double, int> y)
{
   return x.first< y.first;
}

int main()
{
   //...
   //you have your vector<double> here, say name is d;
   std::vector<std::pair<double, int> > newVec(d.size());
   for(int i = 0; i < newVec.size(); ++i)
   {
      newVec[i].first = d[i];
      newVec[i].second = i;  //store the initial index
   }
   std::sort(newVec.begin(), newVec.end(), &isSmaller);
   //now you can iterate through first k elements and the second components will be the initial indices
}
眼眸里的那抹悲凉 2024-10-05 01:17:04

不确定预制算法,但请查看选择算法;如果您需要一组 N 个值的前 K 个元素,并且 N 远大于 K,则有更有效的方法。

如果您可以创建一个索引类(如@user470379的答案 - 基本上是一个将指针/索引封装到只读的“真实”数据的类),那么使用最大大小 K 的优先级队列,并添加每个未排序的将元素添加到优先级队列,当队列达到大小 K+1 时,弹出最底部的元素。在 N = 106、K = 100 等情况下,这种处理方式比完整排序更简单、更高效。

Not sure about pre-canned algorithms, but take a look at selection algorithms; if you need the top K elements of a set of N values and N is much larger than K, there are much more efficient methods.

If you can create an indexing class (like @user470379's answer -- basically a class that encapsulates a pointer/index to the "real" data which is read-only), then use a priority queue of maximum size K, and add each unsorted element to the priority queue, popping off the bottom-most element when the queue reaches size K+1. In cases like N = 106, K = 100, this handles cases much more simply + efficiently than a full sort.

夜访吸血鬼 2024-10-05 01:17:04

所以你实际上需要一个将索引映射到相应双精度的结构。

您可以使用 std::multimap 类来执行此映射。正如 Jason 所指出的,std::map 不允许重复的键。

std::vector<double> v; // assume it is populated already
std::multimap<double, int> m;
for (int i = 0; i < v.size(); ++i)
    m.insert(std::make_pair(v[i], i));
...

完成此操作后,您可以迭代前十个元素,因为映射保留了元素键的排序。

So you actually need a structure that maps indices to corresponding doubles.

You could use std::multimap class to perform this mapping. As Jason have noted std::map does not allow duplicate keys.

std::vector<double> v; // assume it is populated already
std::multimap<double, int> m;
for (int i = 0; i < v.size(); ++i)
    m.insert(std::make_pair(v[i], i));
...

After you've done this you could iterate over first ten elements as map preserves sorting of keys to the elements.

浪菊怪哟 2024-10-05 01:17:04

使用multimap作为向量的(值,索引)来处理重复。使用反向迭代器按降序遍历结果。

#include <multimap>
#include <vector>
using namespace std;

multimap<double, size_t> indices;
vector<double> values;

values.push_back(1.0);
values.push_back(2.0);
values.push_back(3.0);
values.push_back(4.0);

size_t i = 0;
for(vector<double>::const_iterator iter = values.begin(); 
        iter != values.end(); ++iter, ++i)
{
    indices.insert(make_pair<double,int>(*iter, i));
}

i = 0;
size_t limit = 2;
for (multimap<double, size_t>::const_reverse_iterator iter = indices.rbegin(); 
    iter != indices.rend() && i < limit; ++iter, ++i)
{
    cout << "Value " << iter->first << " index " << iter->second << endl;
}

输出是

值 4 索引 3

值 3 索引 2

如果您只需要排序后的向量索引,请使用以下命令:

#include <algorithm>
#include <vector>
using namespace std;

vector<double> values;

values.push_back(1.0);
values.push_back(2.0);
values.push_back(3.0);
values.push_back(4.0);

sort(values.rbegin(), values.rend());

前 K 个条目按 0 到 K-1 进行索引,并按降序排列。这使用反向迭代器与标准sort相结合(使用less在向前迭代时实现降序。等效地:

sort(values.rbegin(), values.rend(), less<double>());

优秀的nth_element的示例代码> @Kiril 在这里建议的解决方案(K = 125000,N = 500000)。我想尝试一下,所以就在这里。

vector<double> values;

for (size_t i = 0; i < 500000; ++i)
{
    values.push_back(rand());
}

nth_element(values.begin(), values.begin()+375000, values.end());
sort(values.begin()+375000, values.end());

vector<double> results(values.rbegin(), values.rbegin() + values.size() - 375000);

Use multimap for vector's (value, index) to handle dups. Use reverse iterators to walk results in descending order.

#include <multimap>
#include <vector>
using namespace std;

multimap<double, size_t> indices;
vector<double> values;

values.push_back(1.0);
values.push_back(2.0);
values.push_back(3.0);
values.push_back(4.0);

size_t i = 0;
for(vector<double>::const_iterator iter = values.begin(); 
        iter != values.end(); ++iter, ++i)
{
    indices.insert(make_pair<double,int>(*iter, i));
}

i = 0;
size_t limit = 2;
for (multimap<double, size_t>::const_reverse_iterator iter = indices.rbegin(); 
    iter != indices.rend() && i < limit; ++iter, ++i)
{
    cout << "Value " << iter->first << " index " << iter->second << endl;
}

Output is

Value 4 index 3

Value 3 index 2

If you just want the vector indices after sort, use this:

#include <algorithm>
#include <vector>
using namespace std;

vector<double> values;

values.push_back(1.0);
values.push_back(2.0);
values.push_back(3.0);
values.push_back(4.0);

sort(values.rbegin(), values.rend());

The top K entries are indexed by 0 to K-1, and appear in descending order. This uses reverse iterators combined with standard sort (using less<double> to achieve descending order when iterated forward. Equivalently:

sort(values.rbegin(), values.rend(), less<double>());

Sample code for the excellent nth_element solution suggested by @Kiril here (K = 125000, N = 500000). I wanted to try this out, so here it is.

vector<double> values;

for (size_t i = 0; i < 500000; ++i)
{
    values.push_back(rand());
}

nth_element(values.begin(), values.begin()+375000, values.end());
sort(values.begin()+375000, values.end());

vector<double> results(values.rbegin(), values.rbegin() + values.size() - 375000);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文