如何快速从排序向量中获取排序子向量

发布于 2024-10-05 05:57:17 字数 1366 浏览 6 评论 0原文

我有一个像这样的数据结构：

struct X {
  float value;
  int id;
};

这些向量（大小N（认为100000），按值排序（在程序执行期间保持不变）：

std::vector<X> values;

现在，我想编写一个函数

void subvector(std::vector<X> const& values, 
               std::vector<int> const& ids, 
               std::vector<X>& out /*, 
               helper data here */);

，用值的排序子集填充 out 参数，该子集由传递的 ids 给出（大小 M <N（大约是N的0.8倍）），快（内存不是问题，而且会重复做，因此构建查找表（来自函数参数的辅助数据）或仅完成一次的其他内容是完全可以的）

到目前为止我的解决方案：
构建包含 id 的查找表 lut -> 值中的偏移量（准备工作，因此运行时间恒定）
创建 std::vector; tmp，大小 N，填充无效 ID（N 呈线性）
对于每个 id，将 values[lut[id]] 复制到 tmp[lut[id]]（在 M 中呈线性）
循环tmp，将项目复制到out（在N中呈线性），

这在N中呈线性（因为它更大）比M），但是临时变量和重复复制让我烦恼。有没有比这更快的方法？请注意，M 将接近 N，因此 O(M log N) 的情况是不利的。

编辑： http://ideone.com/xR8Vp 是上述算法的示例实现，以使所需的输出清晰并证明它在线性时间内是可行的 - 问题是关于避免临时变量或以其他方式加速它的可能性，非线性的东西并不更快:)。

原文

I have a data structure like this:

struct X {
  float value;
  int id;
};

a vector of those (size N (think 100000), sorted by value (stays constant during the execution of the program):

std::vector<X> values;

Now, I want to write a function

void subvector(std::vector<X> const& values, 
               std::vector<int> const& ids, 
               std::vector<X>& out /*, 
               helper data here */);

that fills the out parameter with a sorted subset of values, given by the passed ids (size M < N (about 0.8 times N)), fast (memory is not an issue, and this will be done repeatedly, so building lookuptables (the helper data from the function parameters) or something else that is done only once is entirely ok).

My solution so far:
Build lookuptable lut containing id -> offset in values (preparation, so constant runtime)
create std::vector<X> tmp, size N, filled with invalid ids (linear in N)
for each id, copy values[lut[id]] to tmp[lut[id]] (linear in M)
loop over tmp, copying items to out (linear in N)

this is linear in N (as it's bigger than M), but the temporary variable and repeated copying bugs me. Is there a way to do it quicker than this? Note that M will be close to N, so things that are O(M log N) are unfavourable.

Edit: http://ideone.com/xR8Vp is a sample implementation of mentioned algorithm, to make the desired output clear and prove that it's doable in linear time - the question is about the possibility of avoiding the temporary variable or speeding it up in some other way, something that is not linear is not faster :).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

大海や 2024-10-12 05:57:17

您可以尝试的另一种方法是使用哈希表而不是向量来查找 id：

void subvector(std::vector<X> const& values, 
               std::unordered_set<int> const& ids, 
               std::vector<X>& out) {

    out.clear();
    out.reserve(ids.size());
    for(std::vector<X>::const_iterator i = values.begin(); i != values.end(); ++i) {
        if(ids.find(i->id) != ids.end()) {
            out.push_back(*i);
        }
    }
}

这以线性时间运行，因为 unordered_set::find 是恒定的预期时间（假设我们没有问题）散列整数）。但是我怀疑它在实践中可能不如您最初描述的使用向量的方法那么快。

An alternative approach you could try is to use a hash table instead of a vector to look up ids in:

void subvector(std::vector<X> const& values, 
               std::unordered_set<int> const& ids, 
               std::vector<X>& out) {

    out.clear();
    out.reserve(ids.size());
    for(std::vector<X>::const_iterator i = values.begin(); i != values.end(); ++i) {
        if(ids.find(i->id) != ids.end()) {
            out.push_back(*i);
        }
    }
}

This runs in linear time since unordered_set::find is constant expected time (assuming that we have no problems hashing ints). However I suspect it might not be as fast in practice as the approach you described initially using vectors.

回复收藏 0 原文