如何实现稀疏向量类
我正在实现一个模板化的稀疏向量类。它就像一个向量,但它只存储与其默认构造值不同的元素。
因此,sparse_vector 将存储值不是 T() 的所有索引的延迟排序索引值对。
我的实现基于数值库中现有的稀疏向量——尽管我的实现也可以处理非数值类型 T 。我查看了 boost::numeric::ublas::coordinate_vector
和 eigen::SparseVector
。
两者都存储:
size_t* indices_; // a dynamic array
T* values_; // a dynamic array
int size_;
int capacity_;
为什么他们不简单地使用
vector<pair<size_t, T>> data_;
我的主要问题是这两个系统的优缺点是什么,最终哪个更好?
对向量为您管理 size_ 和capacity_,并简化随附的迭代器类;它也有一个内存块而不是两个,因此它会导致一半的重新分配,并且可能具有更好的引用局部性。
另一种解决方案可能会搜索得更快,因为在搜索期间缓存行仅填充有索引数据。如果 T 是 8 字节类型,可能还会有一些对齐优势?
在我看来,成对向量是更好的解决方案,但两个容器都选择了另一个解决方案。为什么?
I am implementing a templated sparse_vector class. It's like a vector, but it only stores elements that are different from their default constructed value.
So, sparse_vector would store the lazily-sorted index-value pairs for all indices whose value is not T().
I am basing my implementation on existing sparse vectors in numeric libraries-- though mine will handle non-numeric types T as well. I looked at boost::numeric::ublas::coordinate_vector
and eigen::SparseVector
.
Both store:
size_t* indices_; // a dynamic array
T* values_; // a dynamic array
int size_;
int capacity_;
Why don't they simply use
vector<pair<size_t, T>> data_;
My main question is what are the pros and cons of both systems, and which is ultimately better?
The vector of pairs manages size_ and capacity_ for you, and simplifies the accompanying iterator classes; it also has one memory block instead of two, so it incurs half the reallocations, and might have better locality of reference.
The other solution might search more quickly since the cache lines fill up with only index data during a search. There might also be some alignment advantages if T is an 8-byte type?
It seems to me that vector of pairs is the better solution, yet both containers chose the other solution. Why?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
将索引放在单独的列表中将使它们查找起来更快 - 正如您所建议的,它将更有效地使用缓存,特别是如果 T 很大的话。
如果您想实现自己的,为什么不直接使用
std::map
(或std::unordered_map
)?密钥会更大,但实施时间将接近于零!Having indices in a separate list would make them faster to look up - as you suggest, it would use the cache more effectively, particularly if T is large.
If you want to implement your own, why not just use
std::map
(orstd::unordered_map
)? Keys would be larger but implementation time would be close to zero!实际上,他们似乎重新发明了轮子(可以这么说)。
我个人会考虑 2 个库来满足您的需求:
Loki::AssocVector
->通过向量(这就是您想要做的)作为评论,我要指出的是,您可能希望更通用一些,其值与
T()
不同,因为这将T
强制为可默认构造。您可以提供一个采用T const&
的构造函数。在编写通用容器时,最好尝试尽可能地减少必要的要求(只要不损害性能)。另外,我想提醒您,使用
向量
来存储的想法对于少量值来说非常好,但您可能希望将底层容器更改为经典的map 或
unordered_map
(如果值的数量增加)。可能值得进行分析/计时。请注意,STL 通过容器适配器(如 stack)提供了此功能,尽管它可能会使实现稍微困难一些。玩得开心。
Effectively, it seems that they reinvented the wheel (so to speak).
I would personally consider 2 libraries for your need:
Loki::AssocVector
-> the interface of a map implemented over avector
(which is what you wish to do)iterator_adaptor
class. Makes it very easy to implement a new container by Composition.As a remark, I would note that you may wish to be a little more generic that values different from the
T()
because this imposeT
to be DefaultConstructible. You could provide a constructor which takes aT const&
. When writing a generic container it is good to try and reduce the necessary requirements as much as possible (as long as it does not hurt performance).Also, I would remind you that the idea of using a
vector
for storage is very good for a little number of values, but you might wish to change the underlying container toward a classicmap
orunordered_map
if the number of values grows. It could be worth profiling/timing. Note that the STL offer this ability with the Container Adapters likestack
, even though it could make implementation slightly harder.Have fun.