如何在 x86 汇编中编写自修改代码
我正在考虑为我最近开发的一个业余爱好虚拟机编写一个 JIT 编译器。我了解一些汇编语言(我主要是一名 C 程序员。我可以阅读大多数汇编语言并参考我不理解的操作码,并编写一些简单的程序。)但是我很难理解这几个示例我在网上找到的自我修改代码。
这是一个这样的例子:http://asm.sourceforge.net/articles/smc.html
提供的示例程序在运行时做了大约四种不同的修改,其中没有一个被清楚地解释。 Linux 内核中断被多次使用,并且没有解释或详细说明。 (作者在调用中断之前将数据移动到几个寄存器中。我假设他正在传递参数,但这些参数根本没有解释,让读者猜测。)
我正在寻找的是最简单,最直接的例子在自修改程序的代码中。我可以查看并使用它来了解如何编写 x86 程序集中的自修改代码及其工作原理。您可以向我指出任何资源,或者您可以提供任何可以充分证明这一点的示例吗?
我使用 NASM 作为我的汇编器。
编辑:我也在 Linux 上运行此代码。
I'm looking at writing a JIT compiler for a hobby virtual machine I've been working on recently. I know a bit of assembly, (I'm mainly a C programmer. I can read most assembly with reference for opcodes I don't understand, and write some simple programs.) but I'm having a hard time understanding the few examples of self-modifying code I've found online.
This is one such example: http://asm.sourceforge.net/articles/smc.html
The example program provided does about four different modifications when run, none of which are clearly explained. Linux kernel interrupts are used several times, and aren't explained or detailed. (The author moved data into several registers before calling the interrupts. I assume he was passing arguments, but these arguments aren't explained at all, leaving the reader to guess.)
What I'm looking for is the simplest, most straightforward example in code of a self-modifying program. Something that I can look at, and use to understand how self-modifying code in x86 assembly has to be written, and how it works. Are there any resources you can point me to, or any examples you can give that would adequately demonstrate this?
I'm using NASM as my assembler.
EDIT: I'm also running this code on Linux.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
哇,结果比我想象的要痛苦得多。 100% 的痛苦是 Linux 保护程序不被覆盖和/或执行数据。
两种解决方案如下所示。并且涉及到很多谷歌搜索,因此有些简单的放置一些指令字节并执行它们是我的,mprotect 和页面大小对齐是从谷歌搜索中剔除的,这是我必须为这个例子学习的东西。
自修改代码很简单,如果您获取程序或至少只是两个简单的函数,编译然后反汇编,您将获得这些指令的操作码。或者使用 nasm 编译汇编程序块等。由此我确定了将立即数加载到 eax 中然后返回的操作码。
理想情况下,您只需将这些字节放入某个内存中并执行该内存即可。要让 Linux 做到这一点,您必须更改保护,这意味着您必须向其发送一个在 mmap 页上对齐的指针。因此,分配比您需要的更多的内存,找到该分配中页面边界上的对齐地址,并对该地址进行保护,并使用该内存来放置操作码,然后执行。
第二个示例将现有函数编译到程序中,同样由于保护机制,您不能简单地指向它并更改字节,您必须取消对它的写入保护。因此,您必须使用该地址和足够的字节备份到先前的页面边界调用 mprotect 来覆盖要修改的代码。然后,您可以按照您想要的任何方式更改该函数的字节/操作码(只要您不溢出到您想要继续使用的任何函数中)并执行它。在本例中,您可以看到
fun()
有效,然后我将其更改为仅返回一个值,再次调用它,现在它已被修改。wow, this turned out to be a lot more painful than I expected. 100% of the pain was linux protecting the program from being overwritten and/or executing data.
Two solutions shown below. And a lot of googling was involved so the somewhat simple put some instruction bytes and execute them was mine, the mprotect and aligning on page size was culled from google searches, stuff I had to learn for this example.
The self modifying code is straight forward, if you take the program or at least just the two simple functions, compile and then disassemble you will get the opcodes for those instructions. or use nasm to compile blocks of assembler, etc. From this I determined the opcode to load an immediate into eax then return.
Ideally you simply put those bytes in some ram and execute that ram. To get linux to do that you have to change the protection, which means you have to send it a pointer that is aligned on a mmap page. So allocate more than you need, find the aligned address within that allocation that is on a page boundary and mprotect from that address and use that memory to put your opcodes and then execute.
the second example takes an existing function compiled into the program, again because of the protection mechanism you cannot simply point at it and change bytes, you have to unprotect it from writes. So you have to back up to the prior page boundary call mprotect with that address and enough bytes to cover the code to be modified. Then you can change the bytes/opcodes for that function in any way you want (so long as you don't spill over into any function you want to continue to use) and execute it. In this case you can see that
fun()
works, then I change it to simply return a value, call it again and now it has been modified.由于您正在编写 JIT 编译器,因此您可能不需要自修改代码,而是希望在运行时生成可执行代码。这是两件不同的事情。自修改代码是指在开始运行后进行修改的代码。自修改代码对现代处理器有很大的性能损失,因此对于 JIT 编译器来说是不受欢迎的。
在运行时生成可执行代码应该是一个简单的事情,只需使用 PROT_EXEC 和 PROT_WRITE 权限 mmap() 一些内存即可。您还可以对自己分配的某些内存调用 mprotect(),如上面 dwelch 所做的那样。
Since you're writing a JIT compiler, you probably don't want self-modifying code, you want to generate executable code at runtime. These are two different things. Self-modifying code is code that is modified after it has already started running. Self-modifying code has a large performance penalty on modern processors, and therefore would be undesirable for a JIT compiler.
Generating executable code at runtime should be a simple matter of mmap()ing some memory with PROT_EXEC and PROT_WRITE permissions. You could also call mprotect() on some memory you allocated yourself, as dwelch did above.
我正在开发一个自修改游戏来教授 x86 汇编,并且必须解决这个确切的问题。我使用了以下三个库:
AsmJit + AsmTk进行组装: https://github.com/asmjit/asmjit + https://github.com/asmjit/asmtk
UDIS86反汇编:https://github.com/vmt/udis86
使用Udis86阅读说明,用户可以将它们编辑为字符串,然后使用 AsmJit/AsmTk 来组装新字节。这些可以写回内存,正如其他用户指出的那样,写回需要在 Windows 上使用 VirtualProtect 或在 Unix 上使用 mprotect 来修复内存页面权限。
对于 StackOverflow 来说,代码示例有点长,因此我将向您推荐我用代码示例编写的一篇文章:
https://medium.com/squallygame/how-we-wrote-a-self-hacking-game-in-c-d8b9f97bfa99
一个正常运行的仓库在这里(非常轻量级):
https://github.com/Squalr/SelfHackingApp< /a>
I'm working on a self-modifying game to teach x86 assembly, and had to solve this exact problem. I used the following three libraries:
AsmJit + AsmTk for assembling: https://github.com/asmjit/asmjit + https://github.com/asmjit/asmtk
UDIS86 for disassembling: https://github.com/vmt/udis86
Instructions are read with Udis86, the user can edit them as a string, and then AsmJit/AsmTk is used to assemble the new bytes. These can be written back to memory, and as other users have pointed out, the write-back requires using VirtualProtect on Windows or mprotect on Unix to fix the memory page permissions.
The code samples are a just a little long for StackOverflow, so I'll refer you to an article I wrote with code samples:
https://medium.com/squallygame/how-we-wrote-a-self-hacking-game-in-c-d8b9f97bfa99
A functioning repo is here (very light-weight):
https://github.com/Squalr/SelfHackingApp
这是用 AT&T 汇编编写的。从程序的执行中可以看到,由于自修改代码,输出发生了变化。
编译: gcc -m32modify.smodify.c
使用 -m32 选项是因为该示例适用于 32 位机器
Aesembly:
C 测试程序:
输出:
This is written in AT&T assembly. As you can see from the execution of the program, output has changed because of self-modifying code.
Compilation: gcc -m32 modify.s modify.c
the -m32 option is used because the example works on 32 bit machines
Aessembly:
C test-program:
Output:
基于上面的例子,有一个更简单的例子。感谢dwelch提供了很多帮助。
A little bit simpler example based on the example above. Thanks to dwelch helped a lot.
您还可以查看 GNU lighting 等项目。你给它一个简化的 RISC 型机器的代码,它会动态生成正确的机器。
您应该考虑的一个非常现实的问题是与外国图书馆的接口。您可能需要至少支持一些系统级调用/操作,虚拟机才能发挥作用。 Kitsune 的建议是让您考虑系统级调用的良好开端。您可能会使用 mprotect 来确保您修改的内存变得合法可执行。 (@KitsuneYMG)
一些允许调用用 C 编写的动态库的 FFI 应该足以隐藏许多操作系统特定的细节。所有这些问题都会对您的设计产生相当大的影响,因此最好尽早开始考虑它们。
You can also look at projects like GNU lightning. You give it code for a simplified RISC-type machine, and it generates correct machine dynamically.
A very real problem you should think about is interfacing with foreign libraries. You will probably need to support at least some system-level calls/operations for your VM to be useful. Kitsune's advice is a good start to get you thinking about system-level calls. You would probably use mprotect to ensure that the memory you have modified becomes legally executable. (@KitsuneYMG)
Some FFI allowing calls to dynamic libraries written in C should be sufficient to hide a lot of the OS specific details. All these issues can impact your design quite a bit, so it is best to start thinking about them early.
该问题标有“汇编”和“x86”,但没有标有“C”。虽然提出问题的人提到他们主要使用 C 语言,但寻找纯汇编解决方案的人(包括过去的我)很可能会遇到这个问题。因此,这是我对 JIT 程序最简单的演示的尝试,很大程度上受到 old_timer 答案的启发,但用纯汇编重写。
This question is tagged with 'assembly' and 'x86' but not with 'C'. While the person who asked the question mentions they work mostly with C, this question is likely to be encountered by people looking for a pure assembly solution (including me in the past). Hence, this is my attempt at the simplest possible demonstration of a JIT program, heavily inspired by old_timer's answer but rewritten in pure assembly.
由于 https://nasm 可以编写比给出的解决方案简单得多的解决方案.us/doc/nasmdoc8.html#section-8.9.2,section 指令的 ELF 扩展。这允许您定义自定义部分,特别是可写和可执行的部分。基于这一见解,我写了这个(在 Linux amd64 上测试):
记住有关自修改代码的所有正常警告适用(它很危险,不安全,可能会烧毁你的房子......)
编辑:这个答案的先前版本说我们需要 1 字节对齐才能访问指令的任何部分。这是不正确的;该代码似乎适用于对齐值 1 和 16。
A much simpler solution than the ones given can be written due to https://nasm.us/doc/nasmdoc8.html#section-8.9.2, ELF extensions to the section directive. This allows you to define custom sections, and in particalar, one that is both writable and executable. Based on that insight, I wrote this (tested on Linux amd64):
Remember all of the normal caveats about self modifying code apply (it's dangerous, insecure, could burn down your house...)
EDIT: The previous version of this answer said that we needed 1 byte alignment in order to access any part of an instruction. This was incorrect; the code seems to work with an align value of both 1 and 16.
我从未编写过自修改代码,尽管我对它的工作原理有基本的了解。基本上,您在内存中写入要执行的指令,然后跳转到那里。处理器解释您编写的指令的那些字节并(尝试)执行它们。例如,病毒和反复制程序可能会使用此技术。
关于系统调用,你是对的,参数是通过寄存器传递的。有关 Linux 系统调用及其参数的参考,请查看此处。
I've never written self-modifying code, although I have a basic understanding about how it works. Basically you write on memory the instructions you want to execute then jump there. The processor interpret those bytes you've written an instructions and (tries) to execute them. For example, viruses and anti-copy programs may use this technique.
Regarding the system calls, you were right, arguments are passed via registers. For a reference of linux system calls and their argument just check here.