Alloca 实施

发布于 2024-07-16 11:52:34 字数 237 浏览 18 评论 0原文

如何在 D、C 和 C++ 等语言中使用内联 x86 汇编器实现 alloca()? 我想创建一个稍微修改过的版本,但首先我需要知道标准版本是如何实现的。 从编译器中读取反汇编并没有帮助,因为它们执行了很多优化,而我只想要规范形式。

编辑:我想最困难的部分是我希望它具有正常的函数调用语法,即使用裸函数或其他东西,使其看起来像正常的 alloca() 。

编辑#2:啊,到底是什么,你可以假设我们没有省略帧指针。

How does one implement alloca() using inline x86 assembler in languages like D, C, and C++? I want to create a slightly modified version of it, but first I need to know how the standard version is implemented. Reading the disassembly from compilers doesn't help because they perform so many optimizations, and I just want the canonical form.

Edit: I guess the hard part is that I want this to have normal function call syntax, i.e. using a naked function or something, make it look like the normal alloca().

Edit # 2: Ah, what the heck, you can assume that we're not omitting the frame pointer.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(12

厌倦 2024-07-23 11:52:34

实现alloca实际上需要编译器帮助。 这里有一些人说这很简单:

sub esp, <size>

不幸的是,这只是图片的一半。 是的,这会“在堆栈上分配空间”,但有一些问题。

  1. 如果编译器已发出代码
    引用其他变量
    相对于 esp 而不是 ebp
    (典型的情况是你编译时没有
    帧指针)。 然后那些
    需要调整参考。 即使使用帧指针,编译器有时也会这样做。

  2. 更重要的是,根据定义,使用 alloca 分配的空间必须是
    当函数退出时“释放”。

最重要的是第 2 点。 因为您需要编译器发出代码,以在函数的每个出口点对称地将 添加到 esp

最可能的情况是编译器提供了一些内部函数,允许库编写者向编译器请求所需的帮助。

编辑:

事实上,在 glibc(GNU 的 libc 实现)中。 alloca 的实现很简单:

#ifdef  __GNUC__
# define __alloca(size) __builtin_alloca (size)
#endif /* GCC.  */

编辑:

经过思考,我认为编译器至少需要始终在任何使用 alloca 的函数中使用帧指针,无论优化设置如何。 这将允许通过ebp安全地引用所有本地变量,并且通过将帧指针恢复到esp来处理帧清理。

编辑:

所以我做了一些这样的实验:

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

#define __alloca(p, N) \
    do { \
        __asm__ __volatile__( \
        "sub %1, %%esp \n" \
        "mov %%esp, %0  \n" \
         : "=m"(p) \
         : "i"(N) \
         : "esp"); \
    } while(0)

int func() {
    char *p;
    __alloca(p, 100);
    memset(p, 0, 100);
    strcpy(p, "hello world\n");
    printf("%s\n", p);
}

int main() {
    func();
}

不幸的是无法正常工作。 分析 gcc 的汇编输出后。 看来优化是有障碍的。 问题似乎是,由于编译器的优化器完全不知道我的内联汇编,它习惯于以意想不到的顺序执行操作,并且仍然通过esp引用事物。

这是最终的 ASM:

8048454: push   ebp
8048455: mov    ebp,esp
8048457: sub    esp,0x28
804845a: sub    esp,0x64                      ; <- this and the line below are our "alloc"
804845d: mov    DWORD PTR [ebp-0x4],esp
8048460: mov    eax,DWORD PTR [ebp-0x4]
8048463: mov    DWORD PTR [esp+0x8],0x64      ; <- whoops! compiler still referencing via esp
804846b: mov    DWORD PTR [esp+0x4],0x0       ; <- whoops! compiler still referencing via esp
8048473: mov    DWORD PTR [esp],eax           ; <- whoops! compiler still referencing via esp           
8048476: call   8048338 <memset@plt>
804847b: mov    eax,DWORD PTR [ebp-0x4]
804847e: mov    DWORD PTR [esp+0x8],0xd       ; <- whoops! compiler still referencing via esp
8048486: mov    DWORD PTR [esp+0x4],0x80485a8 ; <- whoops! compiler still referencing via esp
804848e: mov    DWORD PTR [esp],eax           ; <- whoops! compiler still referencing via esp
8048491: call   8048358 <memcpy@plt>
8048496: mov    eax,DWORD PTR [ebp-0x4]
8048499: mov    DWORD PTR [esp],eax           ; <- whoops! compiler still referencing via esp
804849c: call   8048368 <puts@plt>
80484a1: leave
80484a2: ret

正如您所看到的,它并不那么简单。 不幸的是,我坚持我最初的主张,即您需要编译器帮助。

implementing alloca actually requires compiler assistance. A few people here are saying it's as easy as:

sub esp, <size>

which is unfortunately only half of the picture. Yes that would "allocate space on the stack" but there are a couple of gotchas.

  1. if the compiler had emitted code
    which references other variables
    relative to esp instead of ebp
    (typical if you compile with no
    frame pointer). Then those
    references need to be adjusted. Even with frame pointers, compilers do this sometimes.

  2. more importantly, by definition, space allocated with alloca must be
    "freed" when the function exits.

The big one is point #2. Because you need the compiler to emit code to symmetrically add <size> to esp at every exit point of the function.

The most likely case is the compiler offers some intrinsics which allow library writers to ask the compiler for the help needed.

EDIT:

In fact, in glibc (GNU's implementation of libc). The implementation of alloca is simply this:

#ifdef  __GNUC__
# define __alloca(size) __builtin_alloca (size)
#endif /* GCC.  */

EDIT:

after thinking about it, the minimum I believe would be required would be for the compiler to always use a frame pointer in any functions which uses alloca, regardless of optimization settings. This would allow all locals to be referenced through ebp safely and the frame cleanup would be handled by restoring the frame pointer to esp.

EDIT:

So i did some experimenting with things like this:

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

#define __alloca(p, N) \
    do { \
        __asm__ __volatile__( \
        "sub %1, %%esp \n" \
        "mov %%esp, %0  \n" \
         : "=m"(p) \
         : "i"(N) \
         : "esp"); \
    } while(0)

int func() {
    char *p;
    __alloca(p, 100);
    memset(p, 0, 100);
    strcpy(p, "hello world\n");
    printf("%s\n", p);
}

int main() {
    func();
}

which unfortunately does not work correctly. After analyzing the assembly output by gcc. It appears that optimizations get in the way. The problem seems to be that since the compiler's optimizer is entirely unaware of my inline assembly, it has a habit of doing the things in an unexpected order and still referencing things via esp.

Here's the resultant ASM:

8048454: push   ebp
8048455: mov    ebp,esp
8048457: sub    esp,0x28
804845a: sub    esp,0x64                      ; <- this and the line below are our "alloc"
804845d: mov    DWORD PTR [ebp-0x4],esp
8048460: mov    eax,DWORD PTR [ebp-0x4]
8048463: mov    DWORD PTR [esp+0x8],0x64      ; <- whoops! compiler still referencing via esp
804846b: mov    DWORD PTR [esp+0x4],0x0       ; <- whoops! compiler still referencing via esp
8048473: mov    DWORD PTR [esp],eax           ; <- whoops! compiler still referencing via esp           
8048476: call   8048338 <memset@plt>
804847b: mov    eax,DWORD PTR [ebp-0x4]
804847e: mov    DWORD PTR [esp+0x8],0xd       ; <- whoops! compiler still referencing via esp
8048486: mov    DWORD PTR [esp+0x4],0x80485a8 ; <- whoops! compiler still referencing via esp
804848e: mov    DWORD PTR [esp],eax           ; <- whoops! compiler still referencing via esp
8048491: call   8048358 <memcpy@plt>
8048496: mov    eax,DWORD PTR [ebp-0x4]
8048499: mov    DWORD PTR [esp],eax           ; <- whoops! compiler still referencing via esp
804849c: call   8048368 <puts@plt>
80484a1: leave
80484a2: ret

As you can see, it isn't so simple. Unfortunately, I stand by my original assertion that you need compiler assistance.

晚雾 2024-07-23 11:52:34

这样做会很棘手 - 事实上,除非您对编译器的代码生成有足够的控制,否则不能完全安全地完成此操作。 您的例程必须操作堆栈,这样当它返回时,所有内容都被清除,但堆栈指针仍保留在内存块保留在该位置的位置。

问题是,除非您可以通知编译器堆栈指针已在您的函数调用中被修改,否则它很可能决定它可以继续通过堆栈指针引用其他局部变量(或其他) - 但偏移量将是不正确。

It would be tricky to do this - in fact, unless you have enough control over the compiler's code generation it cannot be done entirely safely. Your routine would have to manipulate the stack, such that when it returned everything was cleaned, but the stack pointer remained in such a position that the block of memory remained in that place.

The problem is that unless you can inform the compiler that the stack pointer is has been modified across your function call, it may well decide that it can continue to refer to other locals (or whatever) through the stack pointer - but the offsets will be incorrect.

ι不睡觉的鱼゛ 2024-07-23 11:52:34

对于 D 编程语言,alloca() 的源代码随下载一起提供。 它的工作原理已经得到很好的评论。 对于 dmd1,它位于 /dmd/src/phobos/internal/alloca.d 中。 对于 dmd2,它位于 /dmd/src/druntime/src/compiler/dmd/alloca.d 中。

For the D programming language, the source code for alloca() comes with the download. How it works is fairly well commented. For dmd1, it's in /dmd/src/phobos/internal/alloca.d. For dmd2, it's in /dmd/src/druntime/src/compiler/dmd/alloca.d.

我家小可爱 2024-07-23 11:52:34

C 和 C++ 标准没有指定 alloca() 必须使用堆栈,因为 alloca() 不在 C 或 C++ 标准(或 POSIX就此而言) 。

编译器还可以使用堆实现alloca()。 例如,ARM RealView (RVCT) 编译器的 alloca() 使用 malloc() 来分配缓冲区 (在其网站上引用),并且还导致编译器发出释放函数返回时的缓冲区。 这不需要使用堆栈指针,但仍然需要编译器支持。

Microsoft Visual C++ 有一个 _malloca()如果堆栈上没有足够的空间,则使用堆的函数,但它要求调用者使用 _freea(),与 _alloca() 不同,后者不需要/想要显式释放。

(使用 C++ 析构函数,您显然可以在没有编译器支持的情况下进行清理,但是您不能在任意表达式内声明局部变量,因此我认为您不能编写 alloca()使用 RAII 的宏。显然,您不能在某些表达式中使用 alloca() (例如 函数参数)无论如何。)

1 是的,编写一个简单调用 system("/usr/games/nethack" 的 alloca() 是合法的)

The C and C++ standards don't specify that alloca() has to the use the stack, because alloca() isn't in the C or C++ standards (or POSIX for that matter)¹.

A compiler may also implement alloca() using the heap. For example, the ARM RealView (RVCT) compiler's alloca() uses malloc() to allocate the buffer (referenced on their website here), and also causes the compiler to emit code that frees the buffer when the function returns. This doesn't require playing with the stack pointer, but still requires compiler support.

Microsoft Visual C++ has a _malloca() function that uses the heap if there isn't enough room on the stack, but it requires the caller to use _freea(), unlike _alloca(), which does not need/want explicit freeing.

(With C++ destructors at your disposal, you can obviously do the cleanup without compiler support, but you can't declare local variables inside an arbitrary expression so I don't think you could write an alloca() macro that uses RAII. Then again, apparently you can't use alloca() in some expressions (like function parameters) anyway.)

¹ Yes, it's legal to write an alloca() that simply calls system("/usr/games/nethack").

愛放△進行李 2024-07-23 11:52:34

继续传递样式 Alloca

纯 ISO C++ 中的可变长度数组。 概念验证实施。

使用

void foo(unsigned n)
{
    cps_alloca<Payload>(n,[](Payload *first,Payload *last)
    {
        fill(first,last,something);
    });
}

核心理念

template<typename T,unsigned N,typename F>
auto cps_alloca_static(F &&f) -> decltype(f(nullptr,nullptr))
{
    T data[N];
    return f(&data[0],&data[0]+N);
}

template<typename T,typename F>
auto cps_alloca_dynamic(unsigned n,F &&f) -> decltype(f(nullptr,nullptr))
{
    vector<T> data(n);
    return f(&data[0],&data[0]+n);
}

template<typename T,typename F>
auto cps_alloca(unsigned n,F &&f) -> decltype(f(nullptr,nullptr))
{
    switch(n)
    {
        case 1: return cps_alloca_static<T,1>(f);
        case 2: return cps_alloca_static<T,2>(f);
        case 3: return cps_alloca_static<T,3>(f);
        case 4: return cps_alloca_static<T,4>(f);
        case 0: return f(nullptr,nullptr);
        default: return cps_alloca_dynamic<T>(n,f);
    }; // mpl::for_each / array / index pack / recursive bsearch / etc variacion
}

现场演示

cps_alloca 在 github 上

Continuation Passing Style Alloca

Variable-Length Array in pure ISO C++. Proof-of-Concept implementation.

Usage

void foo(unsigned n)
{
    cps_alloca<Payload>(n,[](Payload *first,Payload *last)
    {
        fill(first,last,something);
    });
}

Core Idea

template<typename T,unsigned N,typename F>
auto cps_alloca_static(F &&f) -> decltype(f(nullptr,nullptr))
{
    T data[N];
    return f(&data[0],&data[0]+N);
}

template<typename T,typename F>
auto cps_alloca_dynamic(unsigned n,F &&f) -> decltype(f(nullptr,nullptr))
{
    vector<T> data(n);
    return f(&data[0],&data[0]+n);
}

template<typename T,typename F>
auto cps_alloca(unsigned n,F &&f) -> decltype(f(nullptr,nullptr))
{
    switch(n)
    {
        case 1: return cps_alloca_static<T,1>(f);
        case 2: return cps_alloca_static<T,2>(f);
        case 3: return cps_alloca_static<T,3>(f);
        case 4: return cps_alloca_static<T,4>(f);
        case 0: return f(nullptr,nullptr);
        default: return cps_alloca_dynamic<T>(n,f);
    }; // mpl::for_each / array / index pack / recursive bsearch / etc variacion
}

LIVE DEMO

cps_alloca on github

温馨耳语 2024-07-23 11:52:34

alloca直接用汇编代码实现。
这是因为您无法直接从高级语言控制堆栈布局。

另请注意,大多数实现都会执行一些额外的优化,例如出于性能原因对齐堆栈。
在 X86 上分配堆栈空间的标准方法如下所示:

sub esp, XXX

而 XXX 是要分配的字节数

编辑:
如果您想查看实现(并且您正在使用 MSVC),请参阅 alloca16.asm 和 chkstk.asm。
第一个文件中的代码基本上将所需的分配大小与 16 字节边界对齐。 第二个文件中的代码实际上遍历了属于新堆栈区域的所有页面并触及它们。 这可能会触发操作系统使用 PAGE_GAURD 异常来增加堆栈。

alloca is directly implemented in assembly code.
That's because you cannot control stack layout directly from high level languages.

Also note that most implementation will perform some additional optimization like aligning the stack for performance reasons.
The standard way of allocating stack space on X86 looks like this:

sub esp, XXX

Whereas XXX is the number of bytes to allcoate

Edit:
If you want to look at the implementation (and you're using MSVC) see alloca16.asm and chkstk.asm.
The code in the first file basically aligns the desired allocation size to a 16 byte boundary. Code in the 2nd file actually walks all pages which would belong to the new stack area and touches them. This will possibly trigger PAGE_GAURD exceptions which are used by the OS to grow the stack.

独留℉清风醉 2024-07-23 11:52:34

您可以检查开源 C 编译器的源代码,例如 Open Watcom,并找到你自己

You can examine sources of an open-source C compiler, like Open Watcom, and find it yourself

悲欢浪云 2024-07-23 11:52:34

如果不能使用 c99 的可变长度数组,则可以使用复合文字转换为 void 指针。

#define ALLOCA(sz) ((void*)((char[sz]){0}))

这也适用于 -ansi (作为 gcc 扩展),甚至当它是函数参数时;

some_func(&useful_return, ALLOCA(sizeof(struct useless_return)));

缺点是,当编译为 c++ 时,g++>4.6 会给你一个 错误:获取临时数组的地址 ... clang 和 icc 不会抱怨

If you can't use c99's Variable Length Arrays, you can use a compound literal cast to a void pointer.

#define ALLOCA(sz) ((void*)((char[sz]){0}))

This also works for -ansi (as a gcc extension) and even when it is a function argument;

some_func(&useful_return, ALLOCA(sizeof(struct useless_return)));

The downside is that when compiled as c++, g++>4.6 will give you an error: taking address of temporary array ... clang and icc don't complain though

謸气贵蔟 2024-07-23 11:52:34

Alloca很简单,只需将堆栈指针向上移动即可; 然后生成所有读/写以指向这个新块

sub esp, 4

Alloca is easy, you just move the stack pointer up; then generate all the read/writes to point to this new block

sub esp, 4
东京女 2024-07-23 11:52:34

我们想要做的是这样的:

void* alloca(size_t size) {
    <sp> -= size;
    return <sp>;
}

在 Assembly(Visual Studio 2017,64 位)中,它看起来像:

;alloca.asm

_TEXT SEGMENT
    PUBLIC alloca
    alloca PROC
        sub rsp, rcx ;<sp> -= size
        mov rax, rsp ;return <sp>;
        ret
    alloca ENDP
_TEXT ENDS

END

不幸的是,我们的返回指针是堆栈上的最后一项,我们不想覆盖它。 此外,我们需要注意对齐,即。 将 size 舍入为 8 的倍数。所以我们必须这样做:

;alloca.asm

_TEXT SEGMENT
    PUBLIC alloca
    alloca PROC
        ;round up to multiple of 8
        mov rax, rcx
        mov rbx, 8
        xor rdx, rdx
        div rbx
        sub rbx, rdx
        mov rax, rbx
        mov rbx, 8
        xor rdx, rdx
        div rbx
        add rcx, rdx

        ;increase stack pointer
        pop rbx
        sub rsp, rcx
        mov rax, rsp
        push rbx
        ret
    alloca ENDP
_TEXT ENDS

END

What we want to do is something like that:

void* alloca(size_t size) {
    <sp> -= size;
    return <sp>;
}

In Assembly (Visual Studio 2017, 64bit) it looks like:

;alloca.asm

_TEXT SEGMENT
    PUBLIC alloca
    alloca PROC
        sub rsp, rcx ;<sp> -= size
        mov rax, rsp ;return <sp>;
        ret
    alloca ENDP
_TEXT ENDS

END

Unfortunately our return pointer is the last item on the stack, and we do not want to overwrite it. Additionally we need to take care for the alignment, ie. round size up to multiple of 8. So we have to do this:

;alloca.asm

_TEXT SEGMENT
    PUBLIC alloca
    alloca PROC
        ;round up to multiple of 8
        mov rax, rcx
        mov rbx, 8
        xor rdx, rdx
        div rbx
        sub rbx, rdx
        mov rax, rbx
        mov rbx, 8
        xor rdx, rdx
        div rbx
        add rcx, rdx

        ;increase stack pointer
        pop rbx
        sub rsp, rcx
        mov rax, rsp
        push rbx
        ret
    alloca ENDP
_TEXT ENDS

END
恋你朝朝暮暮 2024-07-23 11:52:34
my_alloca:  ; void *my_alloca(int size);
    MOV     EAX, [ESP+4]    ; get size
    ADD     EAX,-4          ; include return address as stack space(4bytes)
    SUB     ESP,EAX
    JMP     DWORD [ESP+EAX]     ; replace RET(do not pop return address)
my_alloca:  ; void *my_alloca(int size);
    MOV     EAX, [ESP+4]    ; get size
    ADD     EAX,-4          ; include return address as stack space(4bytes)
    SUB     ESP,EAX
    JMP     DWORD [ESP+EAX]     ; replace RET(do not pop return address)
﹏半生如梦愿梦如真 2024-07-23 11:52:34

我推荐“输入”指令。 可在 286 及更新的处理器上使用(可能也可在 186 上使用,我记不清了,但无论如何这些都没有广泛使用)。

I recommend the "enter" instruction. Available on 286 and newer processors (may have been available on the 186 as well, I can't remember offhand, but those weren't widely available anyways).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文