函数体在堆上
程序由三部分组成:文本、数据和堆栈。函数体位于文本部分。我们可以让函数体存在于堆上吗?因为我们可以更自由地操作堆上的内存,所以我们可能会获得更多的自由来操作函数。
在下面的 C 代码中,我将 hello 函数的文本复制到堆上,然后将函数指针指向它。该程序可以通过 gcc 编译良好,但在运行时会出现“分段错误”。
你能告诉我为什么吗? 如果我的程序无法修复,您能否提供一种让函数驻留在堆上的方法? 谢谢!
图灵机器人
#include "stdio.h"
#include "stdlib.h"
#include "string.h"
void
hello()
{
printf( "Hello World!\n");
}
int main(void)
{
void (*fp)();
int size = 10000; // large enough to contain hello()
char* buffer;
buffer = (char*) malloc ( size );
memcpy( buffer,(char*)hello,size );
fp = buffer;
fp();
free (buffer);
return 0;
}
A program has three sections: text, data and stack. The function body lives in the text section. Can we let a function body live on heap? Because we can manipulate memory on heap more freely, we may gain more freedom to manipulate functions.
In the following C code, I copy the text of hello function onto heap and then point a function pointer to it. The program compiles fine by gcc but gives "Segmentation fault" when running.
Could you tell me why?
If my program can not be repaired, could you provide a way to let a function live on heap?
Thanks!
Turing.robot
#include "stdio.h"
#include "stdlib.h"
#include "string.h"
void
hello()
{
printf( "Hello World!\n");
}
int main(void)
{
void (*fp)();
int size = 10000; // large enough to contain hello()
char* buffer;
buffer = (char*) malloc ( size );
memcpy( buffer,(char*)hello,size );
fp = buffer;
fp();
free (buffer);
return 0;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我下面的示例适用于带有
gcc
的 Linuxx86_64
,但类似的注意事项也适用于其他系统。是的,我们绝对可以。但通常这称为 JIT(即时)编译。请参阅此了解基本想法。
确切地说,这就是为什么像 JavaScript 这样的高级语言有 JIT 编译器。
实际上,该代码中有多个“分段错误”。
第一个来自这一行:
如果您看到由
gcc
生成的x86_64
机器代码hello
函数,它编译后仅为 17 个字节:因此,当您尝试复制 10,000 字节时,您会遇到内存问题
不存在并得到
“分段错误”
。其次,您使用
malloc
分配内存,这会给您一个切片受 CPU 保护的内存不能在 Linux
x86_64
上执行,因此这会给你带来另一个
“分段错误”
。在底层,
malloc
使用brk
、sbrk
和mmap
等系统调用来分配内存。您需要做的是使用带有PROT_EXEC
保护的mmap
系统调用来分配可执行内存。第三,当
gcc
编译您的hello
函数时,您并不真正知道它将使用哪些优化以及生成的机器代码是什么样的。例如,如果您看到已编译的
hello
函数的第 4 行,gcc
将其优化为使用puts
函数而不是printf
>,但那是甚至不是主要问题。
在 x86 架构上,您通常使用
call
调用函数 组装助记符,但是,它不是一条指令,实际上有许多不同的机器指令可以编译成
call
,请参见英特尔手册页第 1 卷2A 3-123,供参考。在您的情况下,编译器选择对
call
汇编指令使用相对寻址。您可以看到,因为您的
call
指令具有e8
操作码:这基本上意味着指令指针将从当前指令指针跳转到相对字节数。
现在,当您使用
memcpy
将代码重新定位到堆时,您只需复制该相对调用
,该调用现在将跳转指令指针相对位置您将代码复制到堆中,该内存很可能不存在,并且您将收到另一个“分段错误”
。下面是一个工作代码,这是我所做的:
mmap
和PROT_EXEC
选项分配可执行内存。printf
函数作为参数传递给我们的heap_function
以确保gcc
对call
指令使用绝对跳转。这是一个工作代码:
保存在
main.c
中并运行:My examples below are for Linux
x86_64
withgcc
, but similar considerations should apply on other systems.Yes, absolutely we can. But usually that is called JIT (Just-in-time) compilation. See this for basic idea.
Exactly, that's why higher level languages like JavaScript have JIT compilers.
Actually you have multiple
"Segmentation fault"
s in that code.The first one comes from this line:
If you see
x86_64
machine code generated bygcc
of yourhello
function, it compiles down to mere 17 bytes:So, when you are trying to copy 10,000 bytes, you run into a memory
that does not exist and get
"Segmentation fault"
.Secondly, you allocate memory with
malloc
, which gives you a slice ofmemory that is protected by CPU against execution on Linux
x86_64
, sothis would give you another
"Segmentation fault"
.Under the hood
malloc
uses system calls likebrk
,sbrk
, andmmap
to allocate memory. What you need to do is allocate executable memory usingmmap
system call withPROT_EXEC
protection.Thirdly, when
gcc
compiles yourhello
function, you don't really know what optimisations it will use and what the resulting machine code looks like.For example, if you see line 4 of the compiled
hello
functiongcc
optimised it to useputs
function instead ofprintf
, but that isnot even the main problem.
On
x86
architectures you normally call functions usingcall
assemblymnemonic, however, it is not a single instruction, there are actually many different machine instructions that
call
can compile to, see Intel manual page Vol. 2A 3-123, for reference.In you case the compiler has chosen to use relative addressing for the
call
assembly instruction.You can see that, because your
call
instruction hase8
opcode:Which basically means that instruction pointer will jump the relative amount of bytes from the current instruction pointer.
Now, when you relocate your code with
memcpy
to the heap, you simply copy that relativecall
which will now jump the instruction pointer relative from where you copied your code to into the heap, and that memory will most likely not exist and you will get another"Segmentation fault"
.Below is a working code, here is what I do:
printf
once to make suregcc
includes it in our binary.mmap
andPROT_EXEC
option.printf
function as argument to ourheap_function
to make surethat
gcc
uses absolute jumps forcall
instruction.Here is a working code:
Save in
main.c
and run with:原则上概念上是可行的。但是...您正在从“hello”复制,它基本上包含可能调用或引用或跳转到其他地址的汇编指令。其中一些地址在应用程序加载时得到修复。只要复制它并调用它就会崩溃。此外,一些系统(例如 Windows)具有数据执行保护,作为一种安全措施,可以防止数据形式的代码被执行。另外,“你好”有多大?尝试复制超过它的末尾也可能会崩溃。而且您还依赖于编译器如何实现“hallo”。不用说,如果它有效的话,这将非常依赖于编译器和平台。
In principle in concept it is doable. However... You are copying from "hello" which basically contains assembly instructions that possibly call or reference or jump to other addresses. Some of these addresses get fixed up when the application loads. Just copying that and calling into it would then crash. Also some systems like windows have data execution protection that would prevent code in data form being executed, as a security measure. Also, how large is "hello"? Trying to copy past the end of it would likely also crash. And you are also dependent on how the compiler implements "hallo". Needless to say, this would be very compiler and platform dependent, if it worked.
我可以想象,这可能适用于非常简单的架构,或者使用旨在使其变得简单的编译器。
这项工作的众多要求中的一些要求是:
printf()
可以工作。还有更多的要求。除此之外,在可能已经是高度复杂的动态链接环境中执行此操作很奇怪(您是否静态链接它?),并且您根本无法让它工作。
正如 Adam 指出的那样,至少对于堆栈而言,存在安全机制来防止动态构造的代码根本无法执行。您可能需要弄清楚如何关闭它们。
您也可能会被
memcpy()
搞砸。你可能会通过一步步追踪这个过程并观察它向自己的头部射击来学到一些东西。如果 memcpy hack 是问题所在,也许可以尝试以下操作:
I can imagine that this might work on a very simple architecture or with a compiler designed to make it easy.
A few of the many requirements for this work:
printf()
, would work.There are more requirements. Add to this the wierdness of doing this in what is likely to already be a highly complex dynamically linked environment (did you static link it?) and you simply are not ever going to get this to work.
And as Adam points out, there are security mechanisms in place, at least for the stack, to prevent dynamically constructed code from executing at all. You may need to figure out how to turn these off.
You might also be getting clobbered with the
memcpy()
.You might learn something by tracing this through step-by-step and watching it shoot itself in the head. If the memcpy hack is the problem, perhaps try something like:
你的程序出现了段错误,因为你进行的memcpy不仅仅是“hello”;该函数的长度不是 10000 字节,因此一旦您超过 hello 本身,就会出现段错误,因为您正在访问不属于您的内存。
您可能还需要在某些时候使用 mmap() 来确保您尝试调用的内存位置实际上是可执行的。
有许多系统可以执行您似乎想要的操作(例如,Java 的 JIT 编译器在堆中创建本机代码并执行它),但您的示例将比这复杂得多,因为没有简单的方法可以知道函数的大小在运行时(当编译器尚未决定应用哪些优化时,在编译时就更难了)。您可能可以执行 objdump 所做的操作并在运行时读取可执行文件以找到正确的“大小”,但我认为这并不是您实际上想要实现的目标。
You program is segfaulting because you're memcpy'ing more than just "hello"; that function is not 10000 bytes long, so as soon as you get past hello itself, you segfault because you're accessing memory that doesn't belong to you.
You probably also need to use mmap() at some point to make sure the memory location you're trying to call is actually executable.
There are many systems that do what you seem to want (e.g., Java's JIT compiler creates native code in the heap and executes it), but your example will be way more complicated than that because there's no easy way to know the size of your function at runtime (and it's even harder at compile time, when the compiler hasn't yet decide what optimizations to apply). You can probably do what objdump does and read the executable at runtime to find the right "size", but I don't think that's what you're actually trying to achieve here.
malloc 之后,您应该检查指针是否不为 null
buffer = (char*) malloc ( size );
这可能是你的问题,因为你尝试在内存中分配一个大区域。你能检查一下吗?memcpy( buffer,(char*)hello,size );
After malloc you should check that the pointer is not null
buffer = (char*) malloc ( size );
and it might be your problem since you try to allocate a big area in memory. can you check that?memcpy( buffer,(char*)hello,size );
hello
不是复制到缓冲区的源。你在欺骗编译器,它会在运行时进行报复。通过将hello
类型转换为char*
,程序使编译器相信它是这样的,但实际情况并非如此。 永远不要比编译器更聪明。hello
is not a source get copied to buffer. You are cheating the compiler and it is taking it's revenge at run-time. By typecastinghello
tochar*
, the program is making the compiler to believe it so, which is not the case actually. Never out-smart the compiler.