Alloca 实施
如何在 D、C 和 C++ 等语言中使用内联 x86 汇编器实现 alloca()? 我想创建一个稍微修改过的版本,但首先我需要知道标准版本是如何实现的。 从编译器中读取反汇编并没有帮助,因为它们执行了很多优化,而我只想要规范形式。
编辑:我想最困难的部分是我希望它具有正常的函数调用语法,即使用裸函数或其他东西,使其看起来像正常的 alloca() 。
编辑#2:啊,到底是什么,你可以假设我们没有省略帧指针。
How does one implement alloca() using inline x86 assembler in languages like D, C, and C++? I want to create a slightly modified version of it, but first I need to know how the standard version is implemented. Reading the disassembly from compilers doesn't help because they perform so many optimizations, and I just want the canonical form.
Edit: I guess the hard part is that I want this to have normal function call syntax, i.e. using a naked function or something, make it look like the normal alloca().
Edit # 2: Ah, what the heck, you can assume that we're not omitting the frame pointer.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
实现
alloca
实际上需要编译器帮助。 这里有一些人说这很简单:不幸的是,这只是图片的一半。 是的,这会“在堆栈上分配空间”,但有一些问题。
如果编译器已发出代码
引用其他变量
相对于
esp
而不是ebp
(典型的情况是你编译时没有
帧指针)。 然后那些
需要调整参考。 即使使用帧指针,编译器有时也会这样做。
更重要的是,根据定义,使用
alloca
分配的空间必须是当函数退出时“释放”。
最重要的是第 2 点。 因为您需要编译器发出代码,以在函数的每个出口点对称地将
添加到esp
。最可能的情况是编译器提供了一些内部函数,允许库编写者向编译器请求所需的帮助。
编辑:
事实上,在 glibc(GNU 的 libc 实现)中。
alloca
的实现很简单:编辑:
经过思考,我认为编译器至少需要始终在任何使用
alloca
的函数中使用帧指针,无论优化设置如何。 这将允许通过ebp
安全地引用所有本地变量,并且通过将帧指针恢复到esp
来处理帧清理。编辑:
所以我做了一些这样的实验:
不幸的是无法正常工作。 分析 gcc 的汇编输出后。 看来优化是有障碍的。 问题似乎是,由于编译器的优化器完全不知道我的内联汇编,它习惯于以意想不到的顺序执行操作,并且仍然通过
esp
引用事物。这是最终的 ASM:
正如您所看到的,它并不那么简单。 不幸的是,我坚持我最初的主张,即您需要编译器帮助。
implementing
alloca
actually requires compiler assistance. A few people here are saying it's as easy as:which is unfortunately only half of the picture. Yes that would "allocate space on the stack" but there are a couple of gotchas.
if the compiler had emitted code
which references other variables
relative to
esp
instead ofebp
(typical if you compile with no
frame pointer). Then those
references need to be adjusted. Even with frame pointers, compilers do this sometimes.
more importantly, by definition, space allocated with
alloca
must be"freed" when the function exits.
The big one is point #2. Because you need the compiler to emit code to symmetrically add
<size>
toesp
at every exit point of the function.The most likely case is the compiler offers some intrinsics which allow library writers to ask the compiler for the help needed.
EDIT:
In fact, in glibc (GNU's implementation of libc). The implementation of
alloca
is simply this:EDIT:
after thinking about it, the minimum I believe would be required would be for the compiler to always use a frame pointer in any functions which uses
alloca
, regardless of optimization settings. This would allow all locals to be referenced throughebp
safely and the frame cleanup would be handled by restoring the frame pointer toesp
.EDIT:
So i did some experimenting with things like this:
which unfortunately does not work correctly. After analyzing the assembly output by gcc. It appears that optimizations get in the way. The problem seems to be that since the compiler's optimizer is entirely unaware of my inline assembly, it has a habit of doing the things in an unexpected order and still referencing things via
esp
.Here's the resultant ASM:
As you can see, it isn't so simple. Unfortunately, I stand by my original assertion that you need compiler assistance.
这样做会很棘手 - 事实上,除非您对编译器的代码生成有足够的控制,否则不能完全安全地完成此操作。 您的例程必须操作堆栈,这样当它返回时,所有内容都被清除,但堆栈指针仍保留在内存块保留在该位置的位置。
问题是,除非您可以通知编译器堆栈指针已在您的函数调用中被修改,否则它很可能决定它可以继续通过堆栈指针引用其他局部变量(或其他) - 但偏移量将是不正确。
It would be tricky to do this - in fact, unless you have enough control over the compiler's code generation it cannot be done entirely safely. Your routine would have to manipulate the stack, such that when it returned everything was cleaned, but the stack pointer remained in such a position that the block of memory remained in that place.
The problem is that unless you can inform the compiler that the stack pointer is has been modified across your function call, it may well decide that it can continue to refer to other locals (or whatever) through the stack pointer - but the offsets will be incorrect.
对于 D 编程语言,alloca() 的源代码随下载一起提供。 它的工作原理已经得到很好的评论。 对于 dmd1,它位于 /dmd/src/phobos/internal/alloca.d 中。 对于 dmd2,它位于 /dmd/src/druntime/src/compiler/dmd/alloca.d 中。
For the D programming language, the source code for alloca() comes with the download. How it works is fairly well commented. For dmd1, it's in /dmd/src/phobos/internal/alloca.d. For dmd2, it's in /dmd/src/druntime/src/compiler/dmd/alloca.d.
C 和 C++ 标准没有指定
alloca()
必须使用堆栈,因为alloca()
不在 C 或 C++ 标准(或 POSIX就此而言) 。编译器还可以使用堆实现
alloca()
。 例如,ARM RealView (RVCT) 编译器的alloca()
使用malloc()
来分配缓冲区 (在其网站上引用),并且还导致编译器发出释放函数返回时的缓冲区。 这不需要使用堆栈指针,但仍然需要编译器支持。Microsoft Visual C++ 有一个
_malloca()
如果堆栈上没有足够的空间,则使用堆的函数,但它要求调用者使用_freea()
,与_alloca()
不同,后者不需要/想要显式释放。(使用 C++ 析构函数,您显然可以在没有编译器支持的情况下进行清理,但是您不能在任意表达式内声明局部变量,因此我认为您不能编写
alloca()
使用 RAII 的宏。显然,您不能在某些表达式中使用alloca()
(例如 函数参数)无论如何。)1 是的,编写一个简单调用
system("/usr/games/nethack" 的
。alloca()
是合法的)The C and C++ standards don't specify that
alloca()
has to the use the stack, becausealloca()
isn't in the C or C++ standards (or POSIX for that matter)¹.A compiler may also implement
alloca()
using the heap. For example, the ARM RealView (RVCT) compiler'salloca()
usesmalloc()
to allocate the buffer (referenced on their website here), and also causes the compiler to emit code that frees the buffer when the function returns. This doesn't require playing with the stack pointer, but still requires compiler support.Microsoft Visual C++ has a
_malloca()
function that uses the heap if there isn't enough room on the stack, but it requires the caller to use_freea()
, unlike_alloca()
, which does not need/want explicit freeing.(With C++ destructors at your disposal, you can obviously do the cleanup without compiler support, but you can't declare local variables inside an arbitrary expression so I don't think you could write an
alloca()
macro that uses RAII. Then again, apparently you can't usealloca()
in some expressions (like function parameters) anyway.)¹ Yes, it's legal to write an
alloca()
that simply callssystem("/usr/games/nethack")
.继续传递样式 Alloca
纯 ISO C++ 中的可变长度数组。 概念验证实施。
使用
核心理念
现场演示
cps_alloca 在 github 上
Continuation Passing Style Alloca
Variable-Length Array in pure ISO C++. Proof-of-Concept implementation.
Usage
Core Idea
LIVE DEMO
cps_alloca on github
alloca直接用汇编代码实现。
这是因为您无法直接从高级语言控制堆栈布局。
另请注意,大多数实现都会执行一些额外的优化,例如出于性能原因对齐堆栈。
在 X86 上分配堆栈空间的标准方法如下所示:
而 XXX 是要分配的字节数
编辑:
如果您想查看实现(并且您正在使用 MSVC),请参阅 alloca16.asm 和 chkstk.asm。
第一个文件中的代码基本上将所需的分配大小与 16 字节边界对齐。 第二个文件中的代码实际上遍历了属于新堆栈区域的所有页面并触及它们。 这可能会触发操作系统使用 PAGE_GAURD 异常来增加堆栈。
alloca is directly implemented in assembly code.
That's because you cannot control stack layout directly from high level languages.
Also note that most implementation will perform some additional optimization like aligning the stack for performance reasons.
The standard way of allocating stack space on X86 looks like this:
Whereas XXX is the number of bytes to allcoate
Edit:
If you want to look at the implementation (and you're using MSVC) see alloca16.asm and chkstk.asm.
The code in the first file basically aligns the desired allocation size to a 16 byte boundary. Code in the 2nd file actually walks all pages which would belong to the new stack area and touches them. This will possibly trigger PAGE_GAURD exceptions which are used by the OS to grow the stack.
您可以检查开源 C 编译器的源代码,例如 Open Watcom,并找到你自己
You can examine sources of an open-source C compiler, like Open Watcom, and find it yourself
如果不能使用 c99 的可变长度数组,则可以使用复合文字转换为 void 指针。
这也适用于 -ansi (作为 gcc 扩展),甚至当它是函数参数时;
缺点是,当编译为 c++ 时,g++>4.6 会给你一个 错误:获取临时数组的地址 ... clang 和 icc 不会抱怨
If you can't use c99's Variable Length Arrays, you can use a compound literal cast to a void pointer.
This also works for -ansi (as a gcc extension) and even when it is a function argument;
The downside is that when compiled as c++, g++>4.6 will give you an error: taking address of temporary array ... clang and icc don't complain though
Alloca很简单,只需将堆栈指针向上移动即可; 然后生成所有读/写以指向这个新块
Alloca is easy, you just move the stack pointer up; then generate all the read/writes to point to this new block
我们想要做的是这样的:
在 Assembly(Visual Studio 2017,64 位)中,它看起来像:
不幸的是,我们的返回指针是堆栈上的最后一项,我们不想覆盖它。 此外,我们需要注意对齐,即。 将 size 舍入为 8 的倍数。所以我们必须这样做:
What we want to do is something like that:
In Assembly (Visual Studio 2017, 64bit) it looks like:
Unfortunately our return pointer is the last item on the stack, and we do not want to overwrite it. Additionally we need to take care for the alignment, ie. round size up to multiple of 8. So we have to do this:
我推荐“输入”指令。 可在 286 及更新的处理器上使用(可能也可在 186 上使用,我记不清了,但无论如何这些都没有广泛使用)。
I recommend the "enter" instruction. Available on 286 and newer processors (may have been available on the 186 as well, I can't remember offhand, but those weren't widely available anyways).