当前位置：文江博客话题详情

Alloca 实施

发布于 2024-07-16 11:52:34 字数 237 浏览 18 评论 0原文

如何在 D、C 和 C++ 等语言中使用内联 x86 汇编器实现 alloca()？我想创建一个稍微修改过的版本，但首先我需要知道标准版本是如何实现的。从编译器中读取反汇编并没有帮助，因为它们执行了很多优化，而我只想要规范形式。

编辑：我想最困难的部分是我希望它具有正常的函数调用语法，即使用裸函数或其他东西，使其看起来像正常的 alloca() 。

编辑#2：啊，到底是什么，你可以假设我们没有省略帧指针。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

厌倦 2024-07-23 11:52:34

实现alloca实际上需要编译器帮助。这里有一些人说这很简单：

sub esp, <size>

不幸的是，这只是图片的一半。是的，这会“在堆栈上分配空间”，但有一些问题。

如果编译器已发出代码
引用其他变量
相对于 esp 而不是 ebp
（典型的情况是你编译时没有
帧指针）。然后那些
需要调整参考。即使使用帧指针，编译器有时也会这样做。
更重要的是，根据定义，使用 alloca 分配的空间必须是
当函数退出时“释放”。

最重要的是第 2 点。因为您需要编译器发出代码，以在函数的每个出口点对称地将添加到 esp 。

最可能的情况是编译器提供了一些内部函数，允许库编写者向编译器请求所需的帮助。

编辑：

事实上，在 glibc（GNU 的 libc 实现）中。 alloca 的实现很简单：

#ifdef  __GNUC__
# define __alloca(size) __builtin_alloca (size)
#endif /* GCC.  */

编辑：

经过思考，我认为编译器至少需要始终在任何使用 alloca 的函数中使用帧指针，无论优化设置如何。这将允许通过ebp安全地引用所有本地变量，并且通过将帧指针恢复到esp来处理帧清理。

编辑：

所以我做了一些这样的实验：

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

#define __alloca(p, N) \
    do { \
        __asm__ __volatile__( \
        "sub %1, %%esp \n" \
        "mov %%esp, %0  \n" \
         : "=m"(p) \
         : "i"(N) \
         : "esp"); \
    } while(0)

int func() {
    char *p;
    __alloca(p, 100);
    memset(p, 0, 100);
    strcpy(p, "hello world\n");
    printf("%s\n", p);
}

int main() {
    func();
}

不幸的是无法正常工作。分析 gcc 的汇编输出后。看来优化是有障碍的。问题似乎是，由于编译器的优化器完全不知道我的内联汇编，它习惯于以意想不到的顺序执行操作，并且仍然通过esp引用事物。

这是最终的 ASM：

8048454: push   ebp
8048455: mov    ebp,esp
8048457: sub    esp,0x28
804845a: sub    esp,0x64                      ; <- this and the line below are our "alloc"
804845d: mov    DWORD PTR [ebp-0x4],esp
8048460: mov    eax,DWORD PTR [ebp-0x4]
8048463: mov    DWORD PTR [esp+0x8],0x64      ; <- whoops! compiler still referencing via esp
804846b: mov    DWORD PTR [esp+0x4],0x0       ; <- whoops! compiler still referencing via esp
8048473: mov    DWORD PTR [esp],eax           ; <- whoops! compiler still referencing via esp           
8048476: call   8048338 <memset@plt>
804847b: mov    eax,DWORD PTR [ebp-0x4]
804847e: mov    DWORD PTR [esp+0x8],0xd       ; <- whoops! compiler still referencing via esp
8048486: mov    DWORD PTR [esp+0x4],0x80485a8 ; <- whoops! compiler still referencing via esp
804848e: mov    DWORD PTR [esp],eax           ; <- whoops! compiler still referencing via esp
8048491: call   8048358 <memcpy@plt>
8048496: mov    eax,DWORD PTR [ebp-0x4]
8048499: mov    DWORD PTR [esp],eax           ; <- whoops! compiler still referencing via esp
804849c: call   8048368 <puts@plt>
80484a1: leave
80484a2: ret

正如您所看到的，它并不那么简单。不幸的是，我坚持我最初的主张，即您需要编译器帮助。

implementing alloca actually requires compiler assistance. A few people here are saying it's as easy as:

sub esp, <size>

which is unfortunately only half of the picture. Yes that would "allocate space on the stack" but there are a couple of gotchas.

if the compiler had emitted code
which references other variables
relative to esp instead of ebp
(typical if you compile with no
frame pointer). Then those
references need to be adjusted. Even with frame pointers, compilers do this sometimes.
more importantly, by definition, space allocated with alloca must be
"freed" when the function exits.

The big one is point #2. Because you need the compiler to emit code to symmetrically add <size> to esp at every exit point of the function.

The most likely case is the compiler offers some intrinsics which allow library writers to ask the compiler for the help needed.

EDIT:

In fact, in glibc (GNU's implementation of libc). The implementation of alloca is simply this:

#ifdef  __GNUC__
# define __alloca(size) __builtin_alloca (size)
#endif /* GCC.  */

EDIT:

after thinking about it, the minimum I believe would be required would be for the compiler to always use a frame pointer in any functions which uses alloca, regardless of optimization settings. This would allow all locals to be referenced through ebp safely and the frame cleanup would be handled by restoring the frame pointer to esp.

EDIT:

So i did some experimenting with things like this:

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

#define __alloca(p, N) \
    do { \
        __asm__ __volatile__( \
        "sub %1, %%esp \n" \
        "mov %%esp, %0  \n" \
         : "=m"(p) \
         : "i"(N) \
         : "esp"); \
    } while(0)

int func() {
    char *p;
    __alloca(p, 100);
    memset(p, 0, 100);
    strcpy(p, "hello world\n");
    printf("%s\n", p);
}

int main() {
    func();
}

which unfortunately does not work correctly. After analyzing the assembly output by gcc. It appears that optimizations get in the way. The problem seems to be that since the compiler's optimizer is entirely unaware of my inline assembly, it has a habit of doing the things in an unexpected order and still referencing things via esp.

Here's the resultant ASM:

8048454: push   ebp
8048455: mov    ebp,esp
8048457: sub    esp,0x28
804845a: sub    esp,0x64                      ; <- this and the line below are our "alloc"
804845d: mov    DWORD PTR [ebp-0x4],esp
8048460: mov    eax,DWORD PTR [ebp-0x4]
8048463: mov    DWORD PTR [esp+0x8],0x64      ; <- whoops! compiler still referencing via esp
804846b: mov    DWORD PTR [esp+0x4],0x0       ; <- whoops! compiler still referencing via esp
8048473: mov    DWORD PTR [esp],eax           ; <- whoops! compiler still referencing via esp           
8048476: call   8048338 <memset@plt>
804847b: mov    eax,DWORD PTR [ebp-0x4]
804847e: mov    DWORD PTR [esp+0x8],0xd       ; <- whoops! compiler still referencing via esp
8048486: mov    DWORD PTR [esp+0x4],0x80485a8 ; <- whoops! compiler still referencing via esp
804848e: mov    DWORD PTR [esp],eax           ; <- whoops! compiler still referencing via esp
8048491: call   8048358 <memcpy@plt>
8048496: mov    eax,DWORD PTR [ebp-0x4]
8048499: mov    DWORD PTR [esp],eax           ; <- whoops! compiler still referencing via esp
804849c: call   8048368 <puts@plt>
80484a1: leave
80484a2: ret

As you can see, it isn't so simple. Unfortunately, I stand by my original assertion that you need compiler assistance.

回复收藏 0 原文

晚雾 2024-07-23 11:52:34

这样做会很棘手 - 事实上，除非您对编译器的代码生成有足够的控制，否则不能完全安全地完成此操作。您的例程必须操作堆栈，这样当它返回时，所有内容都被清除，但堆栈指针仍保留在内存块保留在该位置的位置。

问题是，除非您可以通知编译器堆栈指针已在您的函数调用中被修改，否则它很可能决定它可以继续通过堆栈指针引用其他局部变量（或其他） - 但偏移量将是不正确。

回复收藏 0 原文

ι不睡觉的鱼゛ 2024-07-23 11:52:34

对于 D 编程语言，alloca() 的源代码随下载一起提供。它的工作原理已经得到很好的评论。对于 dmd1，它位于 /dmd/src/phobos/internal/alloca.d 中。对于 dmd2，它位于 /dmd/src/druntime/src/compiler/dmd/alloca.d 中。

回复收藏 0 原文

我家小可爱 2024-07-23 11:52:34

C 和 C++ 标准没有指定 alloca() 必须使用堆栈，因为 alloca() 不在 C 或 C++ 标准（或 POSIX就此而言）。

编译器还可以使用堆实现alloca()。例如，ARM RealView (RVCT) 编译器的 alloca() 使用 malloc() 来分配缓冲区 (在其网站上引用），并且还导致编译器发出释放函数返回时的缓冲区。这不需要使用堆栈指针，但仍然需要编译器支持。

Microsoft Visual C++ 有一个 _malloca()如果堆栈上没有足够的空间，则使用堆的函数，但它要求调用者使用 _freea()，与 _alloca() 不同，后者不需要/想要显式释放。

（使用 C++ 析构函数，您显然可以在没有编译器支持的情况下进行清理，但是您不能在任意表达式内声明局部变量，因此我认为您不能编写 alloca()使用 RAII 的宏。显然，您不能在某些表达式中使用 alloca() （例如函数参数）无论如何。）

1 是的，编写一个简单调用 system("/usr/games/nethack" 的 alloca() 是合法的）。

回复收藏 0 原文

愛放△進行李 2024-07-23 11:52:34

继续传递样式 Alloca

纯 ISO C++ 中的可变长度数组。概念验证实施。

使用

void foo(unsigned n)
{
    cps_alloca<Payload>(n,[](Payload *first,Payload *last)
    {
        fill(first,last,something);
    });
}

核心理念

template<typename T,unsigned N,typename F>
auto cps_alloca_static(F &&f) -> decltype(f(nullptr,nullptr))
{
    T data[N];
    return f(&data[0],&data[0]+N);
}

template<typename T,typename F>
auto cps_alloca_dynamic(unsigned n,F &&f) -> decltype(f(nullptr,nullptr))
{
    vector<T> data(n);
    return f(&data[0],&data[0]+n);
}

template<typename T,typename F>
auto cps_alloca(unsigned n,F &&f) -> decltype(f(nullptr,nullptr))
{
    switch(n)
    {
        case 1: return cps_alloca_static<T,1>(f);
        case 2: return cps_alloca_static<T,2>(f);
        case 3: return cps_alloca_static<T,3>(f);
        case 4: return cps_alloca_static<T,4>(f);
        case 0: return f(nullptr,nullptr);
        default: return cps_alloca_dynamic<T>(n,f);
    }; // mpl::for_each / array / index pack / recursive bsearch / etc variacion
}

现场演示

cps_alloca 在 github 上

Continuation Passing Style Alloca

Variable-Length Array in pure ISO C++. Proof-of-Concept implementation.

Usage

void foo(unsigned n)
{
    cps_alloca<Payload>(n,[](Payload *first,Payload *last)
    {
        fill(first,last,something);
    });
}

Core Idea

template<typename T,unsigned N,typename F>
auto cps_alloca_static(F &&f) -> decltype(f(nullptr,nullptr))
{
    T data[N];
    return f(&data[0],&data[0]+N);
}

template<typename T,typename F>
auto cps_alloca_dynamic(unsigned n,F &&f) -> decltype(f(nullptr,nullptr))
{
    vector<T> data(n);
    return f(&data[0],&data[0]+n);
}

template<typename T,typename F>
auto cps_alloca(unsigned n,F &&f) -> decltype(f(nullptr,nullptr))
{
    switch(n)
    {
        case 1: return cps_alloca_static<T,1>(f);
        case 2: return cps_alloca_static<T,2>(f);
        case 3: return cps_alloca_static<T,3>(f);
        case 4: return cps_alloca_static<T,4>(f);
        case 0: return f(nullptr,nullptr);
        default: return cps_alloca_dynamic<T>(n,f);
    }; // mpl::for_each / array / index pack / recursive bsearch / etc variacion
}

LIVE DEMO

cps_alloca on github

回复收藏 0 原文

温馨耳语 2024-07-23 11:52:34

alloca直接用汇编代码实现。
这是因为您无法直接从高级语言控制堆栈布局。

另请注意，大多数实现都会执行一些额外的优化，例如出于性能原因对齐堆栈。
在 X86 上分配堆栈空间的标准方法如下所示：

sub esp, XXX

而 XXX 是要分配的字节数

编辑：
如果您想查看实现（并且您正在使用 MSVC），请参阅 alloca16.asm 和 chkstk.asm。
第一个文件中的代码基本上将所需的分配大小与 16 字节边界对齐。第二个文件中的代码实际上遍历了属于新堆栈区域的所有页面并触及它们。这可能会触发操作系统使用 PAGE_GAURD 异常来增加堆栈。

alloca is directly implemented in assembly code.
That's because you cannot control stack layout directly from high level languages.

Also note that most implementation will perform some additional optimization like aligning the stack for performance reasons.
The standard way of allocating stack space on X86 looks like this:

sub esp, XXX

Whereas XXX is the number of bytes to allcoate

Edit:
If you want to look at the implementation (and you're using MSVC) see alloca16.asm and chkstk.asm.
The code in the first file basically aligns the desired allocation size to a 16 byte boundary. Code in the 2nd file actually walks all pages which would belong to the new stack area and touches them. This will possibly trigger PAGE_GAURD exceptions which are used by the OS to grow the stack.

回复收藏 0 原文

独留℉清风醉 2024-07-23 11:52:34

您可以检查开源 C 编译器的源代码，例如 Open Watcom，并找到你自己

回复收藏 0 原文

悲欢浪云 2024-07-23 11:52:34

如果不能使用 c99 的可变长度数组，则可以使用复合文字转换为 void 指针。

#define ALLOCA(sz) ((void*)((char[sz]){0}))

这也适用于 -ansi （作为 gcc 扩展），甚至当它是函数参数时；

some_func(&useful_return, ALLOCA(sizeof(struct useless_return)));

缺点是，当编译为 c++ 时，g++>4.6 会给你一个错误：获取临时数组的地址 ... clang 和 icc 不会抱怨

If you can't use c99's Variable Length Arrays, you can use a compound literal cast to a void pointer.

#define ALLOCA(sz) ((void*)((char[sz]){0}))

This also works for -ansi (as a gcc extension) and even when it is a function argument;

some_func(&useful_return, ALLOCA(sizeof(struct useless_return)));

The downside is that when compiled as c++, g++>4.6 will give you an error: taking address of temporary array ... clang and icc don't complain though

回复收藏 0 原文

謸气贵蔟 2024-07-23 11:52:34

Alloca很简单，只需将堆栈指针向上移动即可；然后生成所有读/写以指向这个新块

sub esp, 4

Alloca is easy, you just move the stack pointer up; then generate all the read/writes to point to this new block

sub esp, 4

回复收藏 0 原文

东京女 2024-07-23 11:52:34

我们想要做的是这样的：

void* alloca(size_t size) {
    <sp> -= size;
    return <sp>;
}

在 Assembly（Visual Studio 2017，64 位）中，它看起来像：

;alloca.asm

_TEXT SEGMENT
    PUBLIC alloca
    alloca PROC
        sub rsp, rcx ;<sp> -= size
        mov rax, rsp ;return <sp>;
        ret
    alloca ENDP
_TEXT ENDS

END

不幸的是，我们的返回指针是堆栈上的最后一项，我们不想覆盖它。此外，我们需要注意对齐，即。将 size 舍入为 8 的倍数。所以我们必须这样做：

;alloca.asm

_TEXT SEGMENT
    PUBLIC alloca
    alloca PROC
        ;round up to multiple of 8
        mov rax, rcx
        mov rbx, 8
        xor rdx, rdx
        div rbx
        sub rbx, rdx
        mov rax, rbx
        mov rbx, 8
        xor rdx, rdx
        div rbx
        add rcx, rdx

        ;increase stack pointer
        pop rbx
        sub rsp, rcx
        mov rax, rsp
        push rbx
        ret
    alloca ENDP
_TEXT ENDS

END

What we want to do is something like that:

void* alloca(size_t size) {
    <sp> -= size;
    return <sp>;
}

In Assembly (Visual Studio 2017, 64bit) it looks like:

;alloca.asm

_TEXT SEGMENT
    PUBLIC alloca
    alloca PROC
        sub rsp, rcx ;<sp> -= size
        mov rax, rsp ;return <sp>;
        ret
    alloca ENDP
_TEXT ENDS

END

Unfortunately our return pointer is the last item on the stack, and we do not want to overwrite it. Additionally we need to take care for the alignment, ie. round size up to multiple of 8. So we have to do this:

;alloca.asm

_TEXT SEGMENT
    PUBLIC alloca
    alloca PROC
        ;round up to multiple of 8
        mov rax, rcx
        mov rbx, 8
        xor rdx, rdx
        div rbx
        sub rbx, rdx
        mov rax, rbx
        mov rbx, 8
        xor rdx, rdx
        div rbx
        add rcx, rdx

        ;increase stack pointer
        pop rbx
        sub rsp, rcx
        mov rax, rsp
        push rbx
        ret
    alloca ENDP
_TEXT ENDS

END

回复收藏 0 原文

恋你朝朝暮暮 2024-07-23 11:52:34

my_alloca:  ; void *my_alloca(int size);
    MOV     EAX, [ESP+4]    ; get size
    ADD     EAX,-4          ; include return address as stack space(4bytes)
    SUB     ESP,EAX
    JMP     DWORD [ESP+EAX]     ; replace RET(do not pop return address)

my_alloca:  ; void *my_alloca(int size);
    MOV     EAX, [ESP+4]    ; get size
    ADD     EAX,-4          ; include return address as stack space(4bytes)
    SUB     ESP,EAX
    JMP     DWORD [ESP+EAX]     ; replace RET(do not pop return address)

回复收藏 0 原文