关于 deque的额外间接寻址
想知道为什么我的内存访问比我预期的要慢一些,我终于发现 deque
的 Visual C++ 实现确实有一个内置的额外间接层,破坏了我的内存访问速度。内存局部性。
即它似乎持有一个 T*
数组,而不是一个 T
数组。
是否有另一个我可以与 VC++ 一起使用但没有此“功能”的实现,或者是否有某种方法(尽管我认为不太可能)能够在此实现中避免它?
我基本上是在寻找一个在前面也有 O(1) 推入/弹出的 向量
。
我想我可以自己实现它,但是处理分配器之类的事情很痛苦,而且需要一段时间才能正确实现,所以如果可能的话,我宁愿使用以前编写/测试过的东西。
Wondering why my memory accesses were somewhat slower than I expected, I finally figured out that the Visual C++ implementation of deque
indeed has an extra layer of indirection built-in, destroying my memory locality.
i.e. it seems to hold an array of T*
, not an array of T
.
Is there another implementation I can use with VC++ that doesn't have this "feature", or is there some way (although I consider it unlikely) to be able to avoid it in this implementation?
I'm basically looking for a vector
that has also O(1) push/pop at the front.
I guess I could implement it myself, but dealing with allocator
s and such is a pain and it would take a while to get it right, so I'd rather use something previously written/tested if possible.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
无论出于何种原因,至少从 MSVC 2010 开始,
std::deque
实现似乎使用了令人难以置信的小块大小(如果我没记错的话,最大为 16 个字节或 1 个单个元素) !)。根据我的经验,这可能会导致非常严重的性能问题,因为本质上数据结构中的每个“块”最终仅存储单个元素,这会导致各种额外的开销(时间和内存)。
我不知道为什么要这样做。据我了解,设置一个具有如此小的块大小的双端队列正是它不应该做的事情。
查看 gcc stdlib 实现。从内存来看,他们使用了更大的块大小。
编辑:为了解决其他问题:
std::deque
应该 有一个额外的间接层。它通常被实现为“分块”数据结构,即存储“节点”数组,其中每个节点本身就是一个数据元素数组。它不像链表 - 节点数组永远不会像列表一样“遍历”,它总是直接索引(即使每个块有 1 个元素)。当然,您可以滚动自己的数据结构,在前面保留一些额外的空间。它不会有最坏情况
O(1) 前/后推/弹出
行为,因此它不会满足std::deque
容器的要求。但如果你不关心这些......For whatever reason, at least as of MSVC 2010, the
std::deque
implementation appears to make use of an unbelievably small block size (the max of 16 bytes or 1 single element if I'm not mistaken!).This, in my experience, can result in very significant performance issues, because essentially each "block" in the data structure only ends up storing a single element, which leads to all kinds of additional overhead (time and memory).
I don't know why it's done this way. As far as I understand it setting up a
deque
with such a small block size is exactly how it's not supposed to be done.Check out the
gcc stdlib
implementation. From memory they use a much larger block size.EDIT: In an attempt to address the other issues:
std::deque
should have an extra layer of indirection. It is often implemented as a "blocked" data structure - i.e. storing an array of "nodes" where each node is itself an array of data elements. It's not ever like a linked-list - the array of nodes is never "traversed" like a list, it's always directly indexed (even in the case of 1 element per block).Of course you can roll your own data structure that keeps some extra space at the front. It wont have worst case
O(1) push/pop front/back
behaviour, and as such it wont satisfy the requirements of thestd::deque
container. But if you don't care about any of that...C++ 标准不允许 std::deque 在推送到前面或后面时进行重新分配。这些操作始终是恒定时间。不摊销,始终。
C++标准没有这样的容器。据我所知,Boost 没有一个(认为 Boost.Container 图书馆可能;我还没有研究过)。
The C++ standard does not allow
std::deque
to do reallocations on pushes to the front or back. These operations are always constant time. Not amortized, always.The C++ standard does not have such a container. Boost doesn't have one to my knowledge (thought the Boost.Container library might; I haven't looked into it).
您抱怨的间接性实际上是由标准要求强制的,即引用/指针永远不会被
push
/pop
front
无效code>/back
(显然,那些指向已删除元素的引用/指针除外)。所以你会发现这个要求与任何复杂性要求无关。
当没有可用的房间。
The indirection you complain about is actually mandated by the standard requirement that references/pointers are never invalidated by
push
/pop
front
/back
(except those references/pointers to removed elements, obviously).So you see that this requirement has nothing to do with any complexity requirement.
This indirection also allows for faster (but still O(size))
push
front
/back
when there is no available room.