C++ 中的可变长度数组开销?
看这个问题: 为什么 C/C++ 编译器需要在编译时知道数组的大小? 我认为编译器实现者现在应该有一些时间来熟悉一下(这是 C99 标准的一部分,即 10几年前)并提供有效的实施。
然而(从答案来看)它仍然被认为是昂贵的。
这让我有些惊讶。
当然,我知道静态偏移在性能方面比动态偏移要好得多,并且与一项建议不同,我实际上不会让编译器执行数组的堆分配,因为这可能会花费更多[这并没有已经测量过;)]
但我仍然对所谓的成本感到惊讶:
- 如果函数中没有 VLA,那么据我所知,就不会有任何成本。
- 如果有一个VLA,那么可以将它放在所有变量之前或之后,从而获得大部分堆栈帧的静态偏移量(或者在我看来是这样,但我不熟悉堆栈管理) )
当然会出现多个 VLA 的问题,我想知道专用的 VLA 堆栈是否可行。这意味着 VLA 将由计数和指针(因此大小已知)表示,而辅助堆栈中占用的实际内存仅用于此目的(因此实际上也是一个堆栈)。
[改述]
如何在 gcc / VC++ 中实现 VLA?
成本真的那么令人印象深刻吗?
[结束改述]
在我看来,它只能比使用更好,比如,一个向量
,即使使用当前的实现,因为您不会产生动态分配的成本(以不可调整大小为代价)。
编辑:
这里有部分响应,但是比较VLA 对传统阵列似乎不公平。如果我们事先知道大小,那么我们就不需要 VLA。在同一个问题中,AndreyT 给出了一些有关实现的指示,但它并不像我想要的那么精确。
Looking at this question: Why does a C/C++ compiler need know the size of an array at compile time ? it came to me that compiler implementers should have had some times to get their feet wet now (it's part of C99 standard, that's 10 years ago) and provide efficient implementations.
However it still seems (from the answers) to be considered costly.
This somehow surprises me.
Of course, I understand that a static offset is much better than a dynamic one in terms of performance, and unlike one suggestion I would not actually have the compiler perform a heap allocation of the array since this would probably cost even more [this has not been measured ;)]
But I am still surprised at the supposed cost:
- if there is no VLA in a function, then there would not be any cost, as far I can see.
- if there is one single VLA, then one can either put it before or after all the variables, and therefore get a static offset for most of the stack frame (or so it seems to me, but I am not well-versed in stack management)
The question arise of multiple VLAs of course, and I was wondering if having a dedicated VLA stack would work. This means than a VLA would be represented by a count and a pointer (of known sizes therefore) and the actual memory taken in an secondary stack only used for this purpose (and thus really a stack too).
[rephrasing]
How VLAs are implemented in gcc / VC++ ?
Is the cost really that impressive ?
[end rephrasing]
It seems to me it can only be better than using, say, a vector
, even with present implementations, since you do not incur the cost of a dynamic allocation (at the cost of not being resizable).
EDIT:
There is a partial response here, however comparing VLAs to traditional arrays seem unfair. If we knew the size beforehand, then we would not need a VLA. In the same question AndreyT gave some pointers regarding the implementation, but it's not as precise as I would like.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
AFAIK VC++ 没有实现 VLA。它是一个 C++ 编译器,仅支持 C89(无 VLA,无限制)。我不知道 gcc 如何实现 VLA,但最快的方法是将指向 VLA 的指针及其大小存储在堆栈帧的静态部分中。通过这种方式,您可以访问具有恒定大小数组性能的 VLA 之一(如果堆栈像 x86 中那样向下增长,则它是最后一个 VLA(取消引用 [堆栈指针 + 索引 * 元素大小 + 最后临时推送的大小]),以及第一个 VLA(如果它向上增长)(取消引用 [堆栈帧指针 + 堆栈帧偏移量 + 索引*元素大小]))。所有其他 VLA 将需要再一次间接从堆栈的静态部分获取其基地址。
[ 编辑: 另外,当使用 VLA 时,编译器不能省略堆栈帧基址指针,否则这是多余的,因为可以在编译时计算堆栈指针的所有偏移量。这样你就少了一个免费注册机。 - 结束编辑]
并不真地。此外,如果您不使用它,则无需付费。
[ 编辑: 可能更正确的答案是:与什么相比?与堆分配向量相比,访问时间相同,但分配和释放速度更快。 - 结束编辑]
AFAIK VC++ doesn't implement VLA. It's a C++ compiler and it supports only C89 (no VLA, no restrict). I don't know how gcc implements VLAs but the fastest possible way is to store the pointer to the VLA and its size in the static portion of the stack-frame. This way you can access one of the VLAs with performance of a constant-sized array (it's the last VLA if the stack grows downwards like in x86 (dereference [stack pointer + index*element size + the size of last temporary pushes]), and the first VLA if it grows upwards (dereference [stackframe pointer + offset from stackframe + index*element size])). All the other VLAs will need one more indirection to get their base address from the static portion of the stack.
[ Edit: Also when using VLA the compiler can't omit stack-frame-base pointer, which is redundant otherwise, because all the offsets from the stack pointer can be calculated during compile time. So you have one less free register. — end edit ]
Not really. Moreover, if you don't use it, you don't pay for it.
[ Edit: Probably a more correct answer would be: Compared to what? Compared to a heap allocated vector, the access time will be the same but the allocation and deallocation will be faster. — end edit ]
如果要在 VC++ 中实现,我假设编译器团队会使用
_alloca(size)
的某种变体。而且我认为成本相当于在栈上使用大于8字节对齐的变量(如__m128
);编译器必须将原始堆栈指针存储在某处,并且对齐堆栈需要额外的寄存器来存储未对齐的堆栈。因此,开销基本上是额外的间接寻址(您必须将 VLA 的地址存储在某处)以及由于将原始堆栈范围存储在某处而导致的寄存器压力。
If it were to be implemented in VC++, I would assume the compiler team would use some variant of
_alloca(size)
. And I think the cost is equivalent to using variables with greater than 8-byte alignment on the stack (such as__m128
); the compiler has to store the original stack pointer somewhere, and aligning the stack requires an extra register to store the unaligned stack.So the overhead is basically an extra indirection (you have to store the address of VLA somewhere) and register pressure due to storing the original stack range somewhere as well.