当前位置：文江博客话题详情

如何对列表进行排序并获取前 K 个元素？（标准格式）

发布于 2024-09-28 01:17:04 字数 94 浏览 1 评论 0原文

我有一个双打向量。我想将其从高到低排序，并获取前 K 个元素的索引。 std::sort 只是就地排序，并且不返回我认为的索引。获取最大元素的前 K 个索引的快速方法是什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

野味少女 2024-10-05 01:17:04

您可以使用 nth_element STL 算法 - 这将返回 N 个最大的元素（这是最快的方法，使用 stl ），然后对它们使用 .sort ，或者如果您希望对前 K 个元素进行排序，则可以使用partial_sort 算法（：

仅使用 .sort 很糟糕- 对于你想要的目的来说，它非常很慢。.sort是很棒的STL算法，但是对于整个容器排序，而不仅仅是前K个元素（;这不是偶然的，nth_element和部分排序；）

回复收藏 0 原文

一城柳絮吹成雪 2024-10-05 01:17:04

首先想到的事情有点黑客，但您可以定义一个存储双精度及其原始索引的结构，然后重载 <<运算符基于双精度进行排序：

struct s {
    double d;
    int index;
    bool operator < (const struct &s) const {
        return d < s.d;
    }
};

然后您可以从结构中检索原始索引。

更完整的示例：

vector<double> orig;
vector<s> v;
...
for (int i=0; i < orig.size(); ++i) {
    s s_temp;
    s_temp.d = orig[i];
    s_temp.index = i;
    v.push_back(s);
}
sort(v.begin(), v.end());
//now just retrieve v[i].index

这将使它们从小到大排序，但您可以重载 >运算符，然后如果需要的话将更大的值传递给排序函数。

The first thing that comes to mind is somewhat hackish, but you could define a struct that stored both the double and its original index, then overload the < operator to sort based on the double:

struct s {
    double d;
    int index;
    bool operator < (const struct &s) const {
        return d < s.d;
    }
};

Then you could retrieve the original indices from the struct.

Fuller example:

vector<double> orig;
vector<s> v;
...
for (int i=0; i < orig.size(); ++i) {
    s s_temp;
    s_temp.d = orig[i];
    s_temp.index = i;
    v.push_back(s);
}
sort(v.begin(), v.end());
//now just retrieve v[i].index

This will leave them sorted from smallest to largest, but you could overload the > operator instead and then pass in greater to the sort function if wanted.

回复收藏 0 原文

穿越时光隧道 2024-10-05 01:17:04

好的，这个怎么样？

bool isSmaller (std::pair<double, int> x, std::pair<double, int> y)
{
   return x.first< y.first;
}

int main()
{
   //...
   //you have your vector<double> here, say name is d;
   std::vector<std::pair<double, int> > newVec(d.size());
   for(int i = 0; i < newVec.size(); ++i)
   {
      newVec[i].first = d[i];
      newVec[i].second = i;  //store the initial index
   }
   std::sort(newVec.begin(), newVec.end(), &isSmaller);
   //now you can iterate through first k elements and the second components will be the initial indices
}

OK, how about this?

bool isSmaller (std::pair<double, int> x, std::pair<double, int> y)
{
   return x.first< y.first;
}

int main()
{
   //...
   //you have your vector<double> here, say name is d;
   std::vector<std::pair<double, int> > newVec(d.size());
   for(int i = 0; i < newVec.size(); ++i)
   {
      newVec[i].first = d[i];
      newVec[i].second = i;  //store the initial index
   }
   std::sort(newVec.begin(), newVec.end(), &isSmaller);
   //now you can iterate through first k elements and the second components will be the initial indices
}

回复收藏 0 原文

眼眸里的那抹悲凉 2024-10-05 01:17:04

不确定预制算法，但请查看选择算法；如果您需要一组 N 个值的前 K 个元素，并且 N 远大于 K，则有更有效的方法。

如果您可以创建一个索引类（如@user470379的答案 - 基本上是一个将指针/索引封装到只读的“真实”数据的类），那么使用最大大小 K 的优先级队列，并添加每个未排序的将元素添加到优先级队列，当队列达到大小 K+1 时，弹出最底部的元素。在 N = 10⁶、K = 100 等情况下，这种处理方式比完整排序更简单、更高效。

回复收藏 0 原文

夜访吸血鬼 2024-10-05 01:17:04

所以你实际上需要一个将索引映射到相应双精度的结构。

您可以使用 std::multimap 类来执行此映射。正如 Jason 所指出的，std::map 不允许重复的键。

std::vector<double> v; // assume it is populated already
std::multimap<double, int> m;
for (int i = 0; i < v.size(); ++i)
    m.insert(std::make_pair(v[i], i));
...

完成此操作后，您可以迭代前十个元素，因为映射保留了元素键的排序。

So you actually need a structure that maps indices to corresponding doubles.

You could use std::multimap class to perform this mapping. As Jason have noted std::map does not allow duplicate keys.

std::vector<double> v; // assume it is populated already
std::multimap<double, int> m;
for (int i = 0; i < v.size(); ++i)
    m.insert(std::make_pair(v[i], i));
...

After you've done this you could iterate over first ten elements as map preserves sorting of keys to the elements.

回复收藏 0 原文

浪菊怪哟 2024-10-05 01:17:04

使用multimap作为向量的（值，索引）来处理重复。使用反向迭代器按降序遍历结果。

#include <multimap>
#include <vector>
using namespace std;

multimap<double, size_t> indices;
vector<double> values;

values.push_back(1.0);
values.push_back(2.0);
values.push_back(3.0);
values.push_back(4.0);

size_t i = 0;
for(vector<double>::const_iterator iter = values.begin(); 
        iter != values.end(); ++iter, ++i)
{
    indices.insert(make_pair<double,int>(*iter, i));
}

i = 0;
size_t limit = 2;
for (multimap<double, size_t>::const_reverse_iterator iter = indices.rbegin(); 
    iter != indices.rend() && i < limit; ++iter, ++i)
{
    cout << "Value " << iter->first << " index " << iter->second << endl;
}

输出是

值 4 索引 3
值 3 索引 2

如果您只需要排序后的向量索引，请使用以下命令：

#include <algorithm>
#include <vector>
using namespace std;

vector<double> values;

values.push_back(1.0);
values.push_back(2.0);
values.push_back(3.0);
values.push_back(4.0);

sort(values.rbegin(), values.rend());

前 K 个条目按 0 到 K-1 进行索引，并按降序排列。这使用反向迭代器与标准sort相结合（使用less在向前迭代时实现降序。等效地：

sort(values.rbegin(), values.rend(), less<double>());

优秀的nth_element的示例代码> @Kiril 在这里建议的解决方案（K = 125000，N = 500000）。我想尝试一下，所以就在这里。

vector<double> values;

for (size_t i = 0; i < 500000; ++i)
{
    values.push_back(rand());
}

nth_element(values.begin(), values.begin()+375000, values.end());
sort(values.begin()+375000, values.end());

vector<double> results(values.rbegin(), values.rbegin() + values.size() - 375000);

Use multimap for vector's (value, index) to handle dups. Use reverse iterators to walk results in descending order.

#include <multimap>
#include <vector>
using namespace std;

multimap<double, size_t> indices;
vector<double> values;

values.push_back(1.0);
values.push_back(2.0);
values.push_back(3.0);
values.push_back(4.0);

size_t i = 0;
for(vector<double>::const_iterator iter = values.begin(); 
        iter != values.end(); ++iter, ++i)
{
    indices.insert(make_pair<double,int>(*iter, i));
}

i = 0;
size_t limit = 2;
for (multimap<double, size_t>::const_reverse_iterator iter = indices.rbegin(); 
    iter != indices.rend() && i < limit; ++iter, ++i)
{
    cout << "Value " << iter->first << " index " << iter->second << endl;
}

Output is

Value 4 index 3
Value 3 index 2

If you just want the vector indices after sort, use this:

#include <algorithm>
#include <vector>
using namespace std;

vector<double> values;

values.push_back(1.0);
values.push_back(2.0);
values.push_back(3.0);
values.push_back(4.0);

sort(values.rbegin(), values.rend());

The top K entries are indexed by 0 to K-1, and appear in descending order. This uses reverse iterators combined with standard sort (using less<double> to achieve descending order when iterated forward. Equivalently:

sort(values.rbegin(), values.rend(), less<double>());

Sample code for the excellent nth_element solution suggested by @Kiril here (K = 125000, N = 500000). I wanted to try this out, so here it is.

vector<double> values;

for (size_t i = 0; i < 500000; ++i)
{
    values.push_back(rand());
}

nth_element(values.begin(), values.begin()+375000, values.end());
sort(values.begin()+375000, values.end());

vector<double> results(values.rbegin(), values.rbegin() + values.size() - 375000);

回复收藏 0 原文

~没有更多了~