如何实现稀疏向量类

发布于 2024-09-01 10:24:58 字数 722 浏览 11 评论 0原文

我正在实现一个模板化的稀疏向量类。它就像一个向量，但它只存储与其默认构造值不同的元素。

因此，sparse_vector 将存储值不是 T() 的所有索引的延迟排序索引值对。

我的实现基于数值库中现有的稀疏向量——尽管我的实现也可以处理非数值类型 T 。我查看了 boost::numeric::ublas::coordinate_vector 和 eigen::SparseVector。

两者都存储：

size_t* indices_;  // a dynamic array
T* values_;  // a dynamic array 
int size_;
int capacity_;

为什么他们不简单地使用

vector<pair<size_t, T>> data_;

我的主要问题是这两个系统的优缺点是什么，最终哪个更好？

对向量为您管理 size_ 和capacity_，并简化随附的迭代器类；它也有一个内存块而不是两个，因此它会导致一半的重新分配，并且可能具有更好的引用局部性。

另一种解决方案可能会搜索得更快，因为在搜索期间缓存行仅填充有索引数据。如果 T 是 8 字节类型，可能还会有一些对齐优势？

在我看来，成对向量是更好的解决方案，但两个容器都选择了另一个解决方案。为什么？

原文

I am implementing a templated sparse_vector class. It's like a vector, but it only stores elements that are different from their default constructed value.

So, sparse_vector would store the lazily-sorted index-value pairs for all indices whose value is not T().

I am basing my implementation on existing sparse vectors in numeric libraries-- though mine will handle non-numeric types T as well. I looked at boost::numeric::ublas::coordinate_vector and eigen::SparseVector.

Both store:

size_t* indices_;  // a dynamic array
T* values_;  // a dynamic array 
int size_;
int capacity_;

Why don't they simply use

vector<pair<size_t, T>> data_;

My main question is what are the pros and cons of both systems, and which is ultimately better?

The vector of pairs manages size_ and capacity_ for you, and simplifies the accompanying iterator classes; it also has one memory block instead of two, so it incurs half the reallocations, and might have better locality of reference.

The other solution might search more quickly since the cache lines fill up with only index data during a search. There might also be some alignment advantages if T is an 8-byte type?

It seems to me that vector of pairs is the better solution, yet both containers chose the other solution. Why?

分享到QQ

分享到微博