通常在什么时候为 C++ 中的局部变量分配内存?

发布于 2024-11-29 17:24:40 字数 498 浏览 10 评论 0原文

我正在调试一个相当奇怪的堆栈溢出,据说是由于在堆栈上分配太大的变量引起的,我想澄清以下内容。

假设我有以下函数:

void function()
{
    char buffer[1 * 1024];
    if( condition ) {
       char buffer[1 * 1024];
       doSomething( buffer, sizeof( buffer ) );
    } else {
       char buffer[512 * 1024];
       doSomething( buffer, sizeof( buffer ) );
    }
 }

我知道它依赖于编译器,也取决于优化器的决定,但是为这些局部变量分配内存的典型策略是什么?

一旦进入函数,最坏的情况(1 + 512 KB)是否会立即分配,还是首先分配 1 KB,然后根据条件另外分配 1 或 512 KB?

I'm debugging a rather weird stack overflow supposedly caused by allocating too large variables on stack and I'd like to clarify the following.

Suppose I have the following function:

void function()
{
    char buffer[1 * 1024];
    if( condition ) {
       char buffer[1 * 1024];
       doSomething( buffer, sizeof( buffer ) );
    } else {
       char buffer[512 * 1024];
       doSomething( buffer, sizeof( buffer ) );
    }
 }

I understand, that it's compiler-dependent and also depends on what optimizer decides, but what is the typical strategy for allocating memory for those local variables?

Will the worst case (1 + 512 kilobytes) be allocated immediately once function is entered or will 1 kilobyte be allocated first, then depending on condition either 1 or 512 kilobytes be additionally allocated?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

清晨说晚安 2024-12-06 17:24:41

您的本地(堆栈)变量分配在与堆栈帧相同的空间中。当函数被调用时,堆栈指针会发生变化,为堆栈帧“腾出空间”。它通常在一次调用中完成。如果您使用局部变量使用堆栈,则会遇到堆栈溢出。

无论如何,~512 kbytes 对于堆栈来说确实太大了;您应该使用 std::vector 在堆上分配它。

Your local (stack) variables are allocated in the same space as stack frames. When the function is called, the stack pointer is changed to "make room" for the stack frame. It's typically done in a single call. If you consume the stack with local variables, you'll encounter a stack overflow.

~512 kbytes is really too large for the stack in any case; you should allocate this on the heap using std::vector.

半步萧音过轻尘 2024-12-06 17:24:41

正如您所说,它取决于编译器,但您可以考虑使用 alloca 来克服这个问题。变量仍将在堆栈上分配,并且在超出范围时仍会自动释放,但您可以控制何时以及是否分配堆栈空间。

虽然通常不鼓励使用alloca,但它确实有其用途诸如上述情况。

As you say, it is compiler dependent, but you could consider using alloca to overcome this. The variables would still be allocated on the stack, and still automatically freed as they go out of scope, but you take control over when and if the stack space is allocated.

While use of alloca is typically discouraged, it does have its uses in situations such as the above.

野生奥特曼 2024-12-06 17:24:40

在许多平台/ABI 上,当您进入函数时,会分配整个堆栈帧(包括每个局部变量的内存)。在其他情况下,根据需要一点一点地推入/弹出内存是很常见的。

当然,在一次性分配整个堆栈帧的情况下,不同的编译器仍然可能决定不同的堆栈帧大小。在您的情况下,某些编译器会错过优化机会,并为每个局部变量分配唯一的内存,即使是位于代码不同分支的变量(1 * 1024) code> 数组和您的情况下的 512 * 1024 数组),其中更好的优化编译器应该只分配通过函数的任何路径所需的最大内存(else 路径在你的情况下,所以分配512kb 块应该足够了)。
如果你想知道你的平台是做什么的,请查看反汇编。

但看到整个内存块立即分配,我不会感到惊讶。

On many platforms/ABIs, the entire stackframe (including memory for every local variable) is allocated when you enter the function. On others, it's common to push/pop memory bit by bit, as it is needed.

Of course, in cases where the entire stackframe is allocated in one go, different compilers might still decide on different stack frame sizes. In your case, some compilers would miss an optimization opportunity, and allocate unique memory for every local variable, even the ones that are in different branches of the code (both the 1 * 1024 array and the 512 * 1024 one in your case), where a better optimizing compiler should only allocate the maximum memory required of any path through the function (the else path in your case, so allocating a 512kb block should be enough).
If you want to know what your platform does, look at the disassembly.

But it wouldn't surprise me to see the entire chunk of memory allocated immediately.

被你宠の有点坏 2024-12-06 17:24:40

我检查了 LLVM

void doSomething(char*,char*);

void function(bool b)
{
    char b1[1 * 1024];
    if( b ) {
       char b2[1 * 1024];
       doSomething(b1, b2);
    } else {
       char b3[512 * 1024];
       doSomething(b1, b3);
    }
}

产量:

; ModuleID = '/tmp/webcompile/_28066_0.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"

define void @_Z8functionb(i1 zeroext %b) {
entry:
  %b1 = alloca [1024 x i8], align 1               ; <[1024 x i8]*> [#uses=1]
  %b2 = alloca [1024 x i8], align 1               ; <[1024 x i8]*> [#uses=1]
  %b3 = alloca [524288 x i8], align 1            ; <[524288 x i8]*> [#uses=1]
  %arraydecay = getelementptr inbounds [1024 x i8]* %b1, i64 0, i64 0 ; <i8*> [#uses=2]
  br i1 %b, label %if.then, label %if.else

if.then:                                          ; preds = %entry
  %arraydecay2 = getelementptr inbounds [1024 x i8]* %b2, i64 0, i64 0 ; <i8*> [#uses=1]
  call void @_Z11doSomethingPcS_(i8* %arraydecay, i8* %arraydecay2)
  ret void

if.else:                                          ; preds = %entry
  %arraydecay6 = getelementptr inbounds [524288 x i8]* %b3, i64 0, i64 0 ; <i8*> [#uses=1]
  call void @_Z11doSomethingPcS_(i8* %arraydecay, i8* %arraydecay6)
  ret void
}

declare void @_Z11doSomethingPcS_(i8*, i8*)

您可以在以下位置看到 3 个 alloca函数的顶部。

我必须承认,我对 b2b3 在 IR 中没有折叠在一起感到有点失望,因为只会使用其中之一。

I checked on LLVM:

void doSomething(char*,char*);

void function(bool b)
{
    char b1[1 * 1024];
    if( b ) {
       char b2[1 * 1024];
       doSomething(b1, b2);
    } else {
       char b3[512 * 1024];
       doSomething(b1, b3);
    }
}

Yields:

; ModuleID = '/tmp/webcompile/_28066_0.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"

define void @_Z8functionb(i1 zeroext %b) {
entry:
  %b1 = alloca [1024 x i8], align 1               ; <[1024 x i8]*> [#uses=1]
  %b2 = alloca [1024 x i8], align 1               ; <[1024 x i8]*> [#uses=1]
  %b3 = alloca [524288 x i8], align 1            ; <[524288 x i8]*> [#uses=1]
  %arraydecay = getelementptr inbounds [1024 x i8]* %b1, i64 0, i64 0 ; <i8*> [#uses=2]
  br i1 %b, label %if.then, label %if.else

if.then:                                          ; preds = %entry
  %arraydecay2 = getelementptr inbounds [1024 x i8]* %b2, i64 0, i64 0 ; <i8*> [#uses=1]
  call void @_Z11doSomethingPcS_(i8* %arraydecay, i8* %arraydecay2)
  ret void

if.else:                                          ; preds = %entry
  %arraydecay6 = getelementptr inbounds [524288 x i8]* %b3, i64 0, i64 0 ; <i8*> [#uses=1]
  call void @_Z11doSomethingPcS_(i8* %arraydecay, i8* %arraydecay6)
  ret void
}

declare void @_Z11doSomethingPcS_(i8*, i8*)

You can see the 3 alloca at the top of the function.

I must admit I am slightly disappointed that b2 and b3 are not folded together in the IR, since only one of them will ever be used.

2024-12-06 17:24:40

这种优化称为“堆栈着色”,因为您将多个堆栈对象分配给同一地址。我们知道这是 LLVM 可以改进的一个领域。目前,LLVM 仅对寄存器分配器为溢出槽创建的堆栈对象执行此操作。我们还想扩展它来处理用户堆栈变量,但我们需要一种方法来捕获 IR 中值的生命周期。

这里有一个我们计划如何做到这一点的粗略草图:
http://nondot.org/sabre/LLVMNotes/MemoryUseMarkers.txt

实施工作正在进行中,有几个部分已在主线中实现。

-克里斯

This optimization is known as "stack coloring", because you're assigning multiple stack objects to the same address. This is an area that we know LLVM can improve. Currently LLVM only does this for stack objects created by the register allocator for spill slots. We'd like to extend this to handle user stack variables as well, but we need a way to capture the lifetime of the value in IR.

There is a rough sketch of how we plan to do this here:
http://nondot.org/sabre/LLVMNotes/MemoryUseMarkers.txt

Implementation work on this is underway, several pieces are implemented in mainline.

-Chris

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文