我有兴趣编写一个 x86 反汇编器作为一个教育项目。
我发现的唯一真正的资源是螺旋空间的,“如何编写反汇编程序”。 虽然这对反汇编程序的各个组件进行了很好的高级描述,但我对一些更详细的资源感兴趣。 我还快速浏览了 NASM 源代码,但这在某种程度上是值得学习的重量级内容。
我意识到这个项目的主要挑战之一是我必须处理相当大的 x86 指令集。 我也对基本结构、基本反汇编器链接等感兴趣。
任何人都可以向我指出有关编写 x86 反汇编器的详细资源吗?
I'm interested in writing an x86 dissembler as an educational project.
The only real resource I have found is Spiral Space's, "How to write a disassembler". While this gives a nice high level description of the various components of a disassembler, I'm interested in some more detailed resources. I've also taken a quick look at NASM's source code but this is somewhat of a heavyweight to learn from.
I realize one of the major challenges of this project is the rather large x86 instruction set I'm going to have to handle. I'm also interested in basic structure, basic disassembler links, etc.
Can anyone point me to any detailed resources on writing a x86 disassembler?
发布评论
评论(5)
请参阅 第 17.2 节 ="http://pdos.csail.mit.edu/6.828/2008/readings/i386/toc.htm" rel="noreferrer">80386 程序员参考手册。 反汇编器实际上只是一个美化的有限状态机。 反汇编的步骤为:
F3
、F2
或F0
); 如果是这样,那么您就有了REP
/REPE
/REPNE
/LOCK
前缀。 前进到下一个字节。67
)。 如果是,则如果当前处于 32 位模式,则以 16 位模式解码指令其余部分中的地址;如果当前处于 16 位模式,则以 32 位模式解码地址66
)。 如果是,则如果当前处于 32 位模式,则以 16 位模式解码立即操作数;如果当前处于 16 位模式,则以 32 位模式解码立即操作数2E
、36
、3E
、26
、64
或65)。 如果是这样,请使用相应的段寄存器来解码地址,而不是默认的段寄存器。
0F
,则为扩展操作码,读取下一个字节作为扩展操作码。操作码告诉您正在执行的操作。 操作码的参数可以从 Mod R/M、SIB、位移和立即值的值进行解码。 由于 x86 的复杂性,存在很多可能性和很多特殊情况。 请参阅上面的链接以获得更全面的解释。
Take a look at section 17.2 of the 80386 Programmer's Reference Manual. A disassembler is really just a glorified finite-state machine. The steps in disassembly are:
F3
,F2
, orF0
); if so, then you've got aREP
/REPE
/REPNE
/LOCK
prefix. Advance to the next byte.67
). If so, decode addresses in the rest of the instruction in 16-bit mode if currently in 32-bit mode, or decode addresses in 32-bit mode if currently in 16-bit mode66
). If so, decode immediate operands in 16-bit mode if currently in 32-bit mode, or decode immediate operands in 32-bit mode if currently in 16-bit mode2E
,36
,3E
,26
,64
, or65
). If so, use the corresponding segment register for decoding addresses instead of the default segment register.0F
, then it is an extended opcode, and read the next byte as the extended opcode.The opcode tells you the operation being performed. The arguments of the opcode can be decoded form the values of the Mod R/M, SIB, displacement, and immediate value. There are a lot of possibilities and a lot of special cases, due to the complex nature of x86. See the links above for a more thorough explanation.
我建议检查一些开源反汇编程序,最好是distorm,尤其是“disOps(指令集数据库)”(ctrl +在页面上找到它)。
文档本身充满了有关操作码和指令的有趣信息。
引自 https://code.google.com/p/ distorm/wiki/x86_x64_Machine_Code
https://code 中解释了数据结构和解码阶段。 google.com/p/distorm/wiki/diStorm_Internals
引用:
每个步骤也进行了解释。
由于历史原因保留原始链接:
http://code.google.com/p/ distorm/wiki/x86_x64_Machine_Code 和 http://code.google.com/p /distorm/wiki/diStorm_Internals
I would recommend checking out some open source disassemblers, preferably distorm and especially "disOps (Instructions Sets DataBase)" (ctrl+find it on the page).
The documentation itself is full of juicy information about opcodes and instructions.
Quote from https://code.google.com/p/distorm/wiki/x86_x64_Machine_Code
The data structures and decoding phases are explained in https://code.google.com/p/distorm/wiki/diStorm_Internals
Quote:
Each step is explained also.
The original links are kept for historical reasons:
http://code.google.com/p/distorm/wiki/x86_x64_Machine_Code and http://code.google.com/p/distorm/wiki/diStorm_Internals
从一些已组装的小程序开始,它为您提供生成的代码和指令。 为自己获取指令架构的参考,并使用该架构完成一些生成的代码参考,手工。 您会发现这些指令具有非常典型的结构:inst op op op,具有不同数量的操作数。 您所需要做的就是翻译代码的十六进制或八进制表示形式以匹配指令; 稍微玩一下就会发现它。
这个自动化过程是反汇编程序的核心。 理想情况下,您可能希望在内部(或外部,如果程序非常大)构造一个指令结构数组。 然后,您可以将该数组转换为汇编格式的指令。
Start with some small program that has been assembled, and which gives you both the generated code and the instructions. Get yourself a reference with the instruction architecture, and work through some of the generated code with the architecture reference, by hand. You'll find that the instructions have a very stereotypical structure of inst op op op with varying number of operands. All you need to do is translate the hex or octal representation of the code to match the instructions; a little playing around will reveal it.
That process, automated, is the core of a disassembler. Ideally, you're probably going to want to construct a n array of instruction structures internally (or externally, if the program is really large). You can then translate that array into the instructions in assembler format.
您需要一个操作码表来加载。
基本的查找数据结构是 trie,但是如果您不太关心速度,那么表就足够好了。
要获取基本操作码类型,请从表中的匹配开始。
有几种解码寄存器参数的常用方法; 然而,有足够多的特殊情况需要单独实施其中的大多数。
因为这是有教育意义的,所以看看 ndisasm。
You need a table of opcodes to load from.
The fundamental lookup datastructure is a trie, however a table will do well enough if you don't care much about speed.
To get the base opcode type, beginswith match on the table.
There are a few stock ways of decoding register arguments; however, there are enough special cases to require implementing most of them individually.
Since this is educational, have a look at ndisasm.
查看 objdump 源代码 - 它是一个很棒的工具,它包含许多操作码表,并且它的源代码可以为制作您自己的反汇编程序提供良好的基础。
Checkout objdump sources - it's a great tool, it contains many opcode tables and it's sources can provide a nice base for making your own disassembler.