如何用C语言编写自修改代码?
我想编写一段不断改变自身的代码,即使改变微不足道。
例如,可能类似于
for i in 1 to 100, do
begin
x := 200
for j in 200 downto 1, do
begin
do something
end
end
假设我希望我的代码应该在第一次迭代后将行 x := 200
更改为其他行 x := 199
,然后在下一次迭代后将其更改为 x := 198
等等。
写这样的代码可能吗?我需要为此使用内联汇编吗?
编辑 : 这就是为什么我想用 C 语言来做:
这个程序将在实验操作系统上运行,我不能/不知道如何使用从其他语言编译的程序。我需要这样的代码的真正原因是因为该代码正在虚拟机上的客户操作系统上运行。虚拟机管理程序是一个二进制翻译器,用于翻译代码块。翻译者做了一些优化。它只翻译代码块一次。下次在来宾中使用相同的块时,翻译器将使用之前翻译的结果。现在,如果代码被即时修改,翻译器就会注意到这一点,并将其先前的翻译标记为过时。从而强制重新翻译相同的代码。这就是我想要达到的目的,迫使译者做很多翻译。通常,这些块是分支指令(例如跳转指令)之间的指令。我只是认为自修改代码是实现这一目标的绝佳方法。
I want to write a piece of code that changes itself continuously, even if the change is insignificant.
For example maybe something like
for i in 1 to 100, do begin x := 200 for j in 200 downto 1, do begin do something end end
Suppose I want that my code should after first iteration change the line x := 200
to some other line x := 199
and then after next iteration change it to x := 198
and so on.
Is writing such a code possible ? Would I need to use inline assembly for that ?
EDIT :
Here is why I want to do it in C:
This program will be run on an experimental operating system and I can't / don't know how to use programs compiled from other languages. The real reason I need such a code is because this code is being run on a guest operating system on a virtual machine. The hypervisor is a binary translator that is translating chunks of code. The translator does some optimizations. It only translates the chunks of code once. The next time the same chunk is used in the guest, the translator will use the previously translated result. Now, if the code gets modified on the fly, then the translator notices that, and marks its previous translation as stale. Thus forcing a re-translation of the same code. This is what I want to achieve, to force the translator to do many translations. Typically these chunks are instructions between to branch instructions (such as jump instructions). I just think that self modifying code would be fantastic way to achieve this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
您可能需要考虑用 C 语言编写虚拟机,您可以在其中构建自己的自修改代码。
如果您希望编写自修改可执行文件,很大程度上取决于您所针对的操作系统。您可以通过修改内存中的程序映像来实现您想要的解决方案。为此,您将获得程序代码字节的内存地址。然后,您可以操纵此内存范围上的操作系统保护,从而允许您修改字节而不会遇到访问冲突或“SIG_SEGV”。最后,您将使用指针(在 RISC 机器上可能是 '''unsigned char *''' 指针,也可能是 '''unsigned long *''')来修改已编译程序的操作码。
关键点是您将修改目标体系结构的机器代码。 C 代码在运行时没有规范的格式——C 是编译器的文本输入文件的规范。
You might want to consider writing a virtual machine in C, where you can build your own self-modifying code.
If you wish to write self-modifying executables, much depends on the operating system you are targeting. You might approach your desired solution by modifying the in-memory program image. To do so, you would obtain the in-memory address of your program's code bytes. Then, you might manipulate the operating system protection on this memory range, allowing you to modify the bytes without encountering an Access Violation or '''SIG_SEGV'''. Finally, you would use pointers (perhaps '''unsigned char *''' pointers, possibly '''unsigned long *''' as on RISC machines) to modify the opcodes of the compiled program.
A key point is that you will be modifying machine code of the target architecture. There is no canonical format for C code while it is running -- C is a specification of a textual input file to a compiler.
抱歉,我回答有点晚了,但我想我找到了您正在寻找的东西:https://shanetully.com/2013/12/writing-a-self-mutating-x86_64-c-program/
在本文中,它们通过在堆栈中注入程序集来更改常量的值。然后他们通过修改堆栈上函数的内存来执行 shellcode。
下面是第一个代码:
Sorry, I am answering a bit late, but I think I found exactly what you are looking for : https://shanetully.com/2013/12/writing-a-self-mutating-x86_64-c-program/
In this article, they change the value of a constant by injecting assembly in the stack. Then they execute a shellcode by modifying the memory of a function on the stack.
Below is the first code :
这是可能的,但它很可能无法移植,并且您可能必须应对运行代码的只读内存段以及操作系统设置的其他障碍。
It is possible, but it's most probably not portably possible and you may have to contend with read-only memory segments for the running code and other obstacles put in place by your OS.
这将是一个好的开始。本质上是 C 语言的 Lisp 功能:
http: //nakkaya.com/2010/08/24/a-micro-manual-for-lisp-implemented-in-c/
This would be a good start. Essentially Lisp functionality in C:
http://nakkaya.com/2010/08/24/a-micro-manual-for-lisp-implemented-in-c/
根据您需要多少自由度,您也许可以通过使用函数指针来完成您想要的任务。使用伪代码作为起点,考虑这样的情况:当循环索引
i
发生变化时,我们希望以不同的方式修改该变量x
。我们可以这样做:当我们编译并运行程序时,输出是:
显然,只有在每次运行时您想要使用
x
执行的操作数量有限时,这才有效。为了使更改持久化(这是您希望“自我修改”的一部分),您需要将函数指针变量设置为全局变量或静态变量。我不确定我是否真的可以推荐这种方法,因为通常有更简单、更清晰的方法来完成此类事情。Depending on how much freedom you need, you may be able to accomplish what you want by using function pointers. Using your pseudocode as a jumping-off point, consider the case where we want to modify that variable
x
in different ways as the loop indexi
changes. We could do something like this:The output, when we compile and run the program, is:
Obviously, this will only work if you have finite number of things you want to do with
x
on each run through. In order to make the changes persistent (which is part of what you want from "self-modification"), you would want to make the function-pointer variable either global or static. I'm not sure I really can recommend this approach, because there are often simpler and clearer ways of accomplishing this sort of thing.自解释语言(不像 C 那样硬编译和链接)可能更适合这一点。 Perl、javascript、PHP 都有邪恶的
eval()
函数,可能适合您的目的。通过它,您可以得到一串不断修改的代码,然后通过eval()
执行。A self-interpreting language (not hard-compiled and linked like C) might be better for that. Perl, javascript, PHP have the evil
eval()
function that might be suited to your purpose. By it, you could have a string of code that you constantly modify and then execute viaeval()
.出于可移植性的考虑,关于用 C 实现 LISP 然后使用它的建议是可靠的。但如果您确实愿意,也可以在许多系统上以另一个方向实现这一点,即将程序的字节码加载到内存中,然后返回它。
您可以尝试通过几种方法来做到这一点。一种方法是通过缓冲区溢出漏洞利用。另一种方法是使用 mprotect() 使代码段可写,然后修改编译器创建的函数。
像这样的技术对于编程挑战和混乱的竞赛来说很有趣,但是考虑到您的代码将如何不可读,并且您正在利用 C 认为未定义的行为,因此最好在生产环境中避免使用它们。
The suggestion about implementing LISP in C and then using that is solid, due to portability concerns. But if you really wanted to, this could also be implemented in the other direction on many systems, by loading your program's bytecode into memory and then returning to it.
There's a couple of ways you could attempt to do that. One way is via a buffer overflow exploit. Another would be to use mprotect() to make the code section writable, and then modify compiler-created functions.
Techniques like this are fun for programming challenges and obfuscated competitions, but given how unreadable your code would be combined with the fact you're exploiting what C considers undefined behavior, they're best avoided in production environments.
在标准 C11 中(请阅读 n1570), 你不能编写自修改代码(至少没有未定义行为)。至少从概念上讲,代码段是只读的。
您可以考虑使用您的 插件 来扩展程序的代码href="https://en.wikipedia.org/wiki/Dynamic_linker" rel="nofollow noreferrer">动态链接器。这需要操作系统特定的功能。在 POSIX 上,使用 dlopen (可能还有 dlsym 来获取新加载的函数指针)。然后,您可以使用新函数指针的地址覆盖函数指针。
也许您可以使用一些 JIT 编译 库(例如 libgccjit 或 asmjit) 来实现您的目标。您将获得新的函数地址并将它们放入函数指针中。
请记住,C 编译器可以为给定的函数调用或跳转生成各种大小的代码,因此即使以机器特定的方式覆盖它也是脆弱的。
In standard C11 (read n1570), you cannot write self modifying code (at least without undefined behavior). Conceptually at least, the code segment is read-only.
You might consider extending the code of your program with plugins using your dynamic linker. This require operating system specific functions. On POSIX, use dlopen (and probably dlsym to get newly loaded function pointers). You could then overwrite function pointers with the address of new ones.
Perhaps you could use some JIT-compiling library (like libgccjit or asmjit) to achieve your goals. You'll get fresh function addresses and put them in your function pointers.
Remember that a C compiler can generate code of various size for a given function call or jump, so even overwriting that in a machine specific way is brittle.
我和我的朋友在开发一个可以自我修改代码的游戏时遇到了这个问题。我们允许用户在 x86 程序集中重写代码片段。
这只需要利用两个库——一个汇编器和一个反汇编器:
FASM 汇编器:https://github。 com/ZenLulz/Fasm.NET
Udis86 反汇编器:https://github.com/vmt/udis86
我们使用反汇编器读取指令,让用户编辑它们,使用汇编器将新指令转换为字节,然后将它们写回内存。回写需要在 Windows 上使用 VirtualProtect 来更改页面权限以允许编辑代码。在 Unix 上,您必须使用
mprotect
来代替。我发布了一篇文章我们是如何做到的,以及示例代码。
这些示例是在 Windows 上使用 C++ 实现的,但制作跨平台且仅用 C 语言应该很容易。
My friend and I encountered this problem while working on a game that self-modifies its code. We allow the user to rewrite code snippets in x86 assembly.
This just requires leveraging two libraries -- an assembler, and a disassembler:
FASM assembler: https://github.com/ZenLulz/Fasm.NET
Udis86 disassembler: https://github.com/vmt/udis86
We read instructions using the disassembler, let the user edit them, convert the new instructions to bytes with the assembler, and write them back to memory. The write-back requires using
VirtualProtect
on windows to change page permissions to allow editing the code. On Unix you have to usemprotect
instead.I posted an article on how we did it, as well as the sample code.
These examples are on Windows using C++, but it should be very easy to make cross-platform and C only.
这是在 Windows 上使用 C++ 执行此操作的方法。您必须 VirtualAlloc 具有读/写保护的字节数组,将代码复制到那里,然后使用读/执行保护 VirtualProtect 它。以下是如何动态创建一个不执行任何操作并返回的函数。
您可以使用此工具来汇编代码。
This is how to do it on windows with c++. You'll have to VirtualAlloc a byte array with read/write protections, copy your code there, and VirtualProtect it with read/execute protections. Here's how you dynamically create a function that does nothing and returns.
You assemble the code using this tool.
虽然 C 语言中“真正的”自修改代码是不可能的(汇编方式感觉有点作弊,因为此时,我们正在汇编语言中编写自修改代码,而不是 C 语言,这是最初的问题),但可能存在纯 C 方法使语句产生类似的效果,但矛盾的是没有做您认为应该做的事情。我说矛盾的是,因为 ASM 自修改代码和下面的 C 代码片段表面上/直观上可能没有意义,但如果你把直觉放在一边并进行逻辑分析,那就是合乎逻辑的,这就是使悖论成为悖论的差异。
首先,我们更改 foo.a 和 foo.b 的值并打印该结构。然后我们只更改 foo.a 的值,但观察输出。
While "true" self modifying code in C is impossible (the assembly way feels like slight cheat, because at this point, we're writing self modifying code in assembly and not in C, which was the original question), there might be a pure C way to make the similar effect of statements paradoxically not doing what you think are supposed do to. I say paradoxically, because both the ASM self modifying code and the following C snippet might not superficially/intuitively make sense, but are logical if you put intuition aside and do a logical analysis, which is the discrepancy which makes paradox a paradox.
First, we change the value of
foo.a
andfoo.b
and print the struct. Then we change only the value offoo.a
, but observe the output.