如何反汇编原始 16 位 x86 机器代码?
我想反汇编我拥有的可启动 x86 磁盘的 MBR(前 512 字节)。我已使用将 MBR 复制到一个文件 对于
dd if=/dev/my-device of=mbr bs=512 count=1
可以反汇编文件 mbr
的 Linux 实用程序有什么建议吗?
I'd like to disassemble the MBR (first 512 bytes) of a bootable x86 disk that I have. I have copied the MBR to a file using
dd if=/dev/my-device of=mbr bs=512 count=1
Any suggestions for a Linux utility that can disassemble the file mbr
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
您可以使用 objdump。根据这篇文章语法是:
You can use objdump. According to this article the syntax is:
GNU 工具称为 objdump,例如:
The GNU tool is called objdump, for example:
为此,我喜欢
ndisasm
。它附带 NASM 汇编器,该汇编器是免费且开源的,并且包含在大多数 Linux 发行版的软件包存储库中。I like
ndisasm
for this purpose. It comes with the NASM assembler, which is free and open source and included in the package repositories of most linux distros.解释 - 来自 ndisasm 联机帮助页
-b
= 指定 16、32 或 64 位模式。默认为 16 位模式。-o
= 指定文件的名义加载地址。此选项使 ndisasm 获取它在左侧边缘列出的地址,以及右侧 PC 相对跳转和调用的目标地址。-a
= 启用自动(或智能)同步模式,在该模式下,ndisasm 将尝试通过检查相对跳转的目标地址来猜测应在何处执行同步,并将其称为反汇编。-s
= 手动指定同步地址,这样ndisasm就不会输出任何包含该地址两侧字节的机器指令。因此,从该地址开始的指令将被正确反汇编。mbr
= 要反汇编的文件。Explanation - from ndisasm manpage
-b
= Specifies 16-, 32- or 64-bit mode. The default is 16-bit mode.-o
= Specifies the notional load address for the file. This option causes ndisasm to get the addresses it lists down the left hand margin, and the target addresses of PC-relative jumps and calls, right.-a
= Enables automatic (or intelligent) sync mode, in which ndisasm will attempt to guess where synchronisation should be performed, by means of examining the target addresses of the relative jumps and calls it disassembles.-s
= Manually specifies a synchronisation address, such that ndisasm will not output any machine instruction which encompasses bytes on both sides of the address. Hence the instruction which starts at that address will be correctly disassembled.mbr
= The file to be disassembled.星蓝和hlovdal 两者都有部分规范答案。如果你想反汇编原始 i8086 代码,你通常需要 Intel 语法,而不是 AT&T 语法,所以使用:
如果你的代码是 ELF (或 a.out (或 (E)COFF)),你可以使用短形式:
对于32位或64位代码,省略
,8086
; ELF 标头已包含此信息。jameslin 建议的
ndisasm
也是一个不错的选择,但是objdump< /code> 通常与操作系统一起提供,可以处理 GNU binutils 支持的所有体系结构(GCC 支持的体系结构的超集),并且其输出通常可以输入 GNU
as
(ndisasm 的通常可以输入当然,还是进入nasm
中)。Peter Cordes 建议“Agner Fog 的 objconv 非常好。它在分支目标上添加标签,使您更容易弄清楚代码的作用。它可以反汇编为 NASM、YASM、MASM 或 AT&T (GNU) 语法。”
多媒体迈克已经了解了
--adjust-vma
;ndisasm
等效项是-o
选项。例如,要反汇编
sh4
代码(我使用 Debian 中的一个二进制文件进行测试),请将其与 GNU binutils 一起使用(几乎所有其他反汇编程序都仅限于一种平台,例如带有ndisasm< 的 x86 /code> 和
objconv
):-m
是机器,-EL
表示 Little Endian(对于sh4eb 使用
-EB
代替),这与以任一字节顺序存在的体系结构相关。starblue and hlovdal both have parts of the canonical answer. If you want to disassemble raw i8086 code, you usually want Intel syntax, not AT&T syntax, too, so use:
If your code is ELF (or a.out (or (E)COFF)), you can use the short form:
For 32-bit or 64-bit code, omit the
,8086
; the ELF header already includes this information.ndisasm
, as suggested by jameslin, is also a good choice, butobjdump
usually comes with the OS and can deal with all architectures supported by GNU binutils (superset of those supported by GCC), and its output can usually be fed into GNUas
(ndisasm’s can usually be fed intonasm
though, of course).Peter Cordes suggests that “Agner Fog's objconv is very nice. It puts labels on branch targets, making a lot easier to figure out what the code does. It can disassemble into NASM, YASM, MASM, or AT&T (GNU) syntax.”
Multimedia Mike already found out about
--adjust-vma
; thendisasm
equivalent is the-o
option.To disassemble, say,
sh4
code (I used one binary from Debian to test), use this with GNU binutils (almost all other disassemblers are limited to one platform, such as x86 withndisasm
andobjconv
):The
-m
is the machine, and-EL
means Little Endian (forsh4eb
use-EB
instead), which is relevant for architectures that exist in either endianness.尝试这个命令:
Try this command:
如果您只是想使用反汇编程序,那么 objdump 是一种选择。 nasm 汇编器附带的反汇编器是 ndisasm。您还可以在 Linux 上的 DOS Box 中运行“debug.exe”,前提是您拥有该程序的副本。它还可以进行反汇编以及受控执行;即 CPU 本身的模拟 - 这也很重要,即使在进行反汇编时也是如此,原因我将要描述。
Fake86有一个cpu模拟器。您可以通过以下方式破解它进行反汇编:(a)让它显示指令而不是模拟指令,(b)让它不进行条件跳转或调用调用,而是(相反)将地址堆叠为新的入口点进行反汇编(即,实际上,采用两个分支并封装子例程),(c) 使其在无条件跳转或返回时停止当前反汇编,(d) 使其接受一个、两个或从更多入口点开始,理想情况下(e)让它也接受数据段的基地址,(f)让它对所有未处理的区域作为数据或代码段进行十六进制转储(因为这些通常是间接跳转的地方)或调用或间接访问的数据段进入。)
这涉及到您的查询的另一种含义:“我想制作一个反汇编程序”。 ndisasm 的源代码是可用的,它可以处理 8086 的许多后代,而不仅仅是 8086 本身(如果您想要的只是 8086 甚至 80386 反汇编程序,这会严重扰乱它),但它不是独立的,并且具有严重依赖于发行版的其余部分。
它的主要论点是它使用八进制数字作为操作码 - 这更适合 80x86 - 正如我在 1995 年在 USENET 上的 comp.lang.asm 中指出的那样......并且(事实上)nasm 的创建是对那。因此,它可能更加透明,如果您正在制作自己的反汇编程序,您可能希望将源代码放在方便的地方以进行检查和比较。
您还可以自行运行 debug.exe 程序。
您还可以尝试在 debug.exe 上运行 ndisasm;在剥离出 0x200 字节的 .EXE 文件头后,使其成为原始二进制文件,在从中提取出入口点地址 CS:IP 和堆栈指针地址 SS:SP 后(80x86 堆栈向下增长,因此堆栈段名义上是SS:0 到 SS:(SP-1))。 debug.exe 的 EXE 没有重定位,因此您可以将代码视为原始二进制文件。
但是你不会得到任何可以清楚识别的东西,因为该程序是自我修改的 - 更准确地说:自解压。您将获得一个(勉强)压缩的代码映像(大约 5/6 压缩比),后面跟着一个加载程序。
您必须对其运行仿真,例如,通过在 debug.exe 上运行 debug.exe 来模拟其解包例程,使其自行解压,然后转储解包的程序映像并反汇编它。在加载程序的末尾有一个“重定位表”,因此它实际上有重定位 - 只是它们在程序解包时应用,而不是在加载 EXE 文件时由操作系统执行。
然后你刚刚反汇编了一个反汇编程序,它也恰好进行 CPU 模拟,就像 Fake86 那样 - 但仅限于 8086。你必须使绝对地址相对(使用原始重定位表作为指导),以使得是可重新组装的。完成此操作后,您就可以处理源代码了。操作码表清晰可见(如果将其显示为文本)- 无论是在 debug.exe 的打包版本还是未打包版本中都可以看到。
GitHub 上还有 DosDebug。它处理直到“80586”(或奔腾)和“80686”的所有内容:它为某些指令标记生成“6”。;例如,条件“cmov”操作由它处理,以及它们的“fcmov”浮动DosDebug 在 8086 程序集中,最适合使用 jwasm 进行编译,我不知道
我可能会将 DAS 反汇编程序移植到 x86。 (a)-(f) 已经纳入 DAS 的设计中,到目前为止我只将其移植到 8051、6800、6809 和 8080/8085(以及 Z80);但从 8085 到 8086 的过渡相对较小。为此,我可能会从 Fake86 中破解一些东西,因为作者用 XTulator 替换了它,因为 Fake86 是在程序员对 C 比较陌生时编写的。您也可以直接破解一些东西。 DosDebug 的操作码表(它们的“instr.*”文件)。
If you're just looking to use a disassembler, then objdump is one choice. The disassembler that comes with the nasm assembler is ndisasm. You can also run "debug.exe" in DOS Box on Linux, provided you get a hold of a copy of the program. It also does disassembly, as well as controlled execution; i.e. simulation of the CPU, itself - which is also important, even when doing disassembly, for reasons I'm about to describe.
Fake86 has a cpu emulator. You may be able to hack it into doing disassembly by (a) having it show the instruction instead of simulating it, (b) having it not take conditional jumps or invoke calls, but (instead) stacking the address as a new entry point to do disassembly from (i.e., in effect, taking both branches and encapsulating subroutines), (c) having it stop the current disassembly at an unconditional jump or return, (d) having it accept one, two or more entry points to start with and ideally (e) having it also accept base addresses for data segments, and (f) getting it to do a hex dump of all the areas unprocessed as data or code segments (as these are usually where indirect jumps or calls or indirectly-accessed data segments land into.)
This gets to the other sense of your query: "I want to make a disassembler". The source for ndisasm is available, and it handles many of the descendants of 8086, not just 8086, itself (which seriously clutters it, if all you want is an 8086 or even 80386 disassembler), but it is not self-contained and has a heavy dependency on the rest of the distribution.
Its main talking point is that it uses octal digits for the opcodes - which better fits the 80x86 - as I pointed out on the USENET in 1995 in comp.lang.asm ... and (in fact) nasm's creation was a direct response to that. So, it's potentially more transparent and you may want to keep the source handy as a check and comparison, if you're making your own disassembler.
You can also run the debug.exe program on itself.
You could also try to run ndisasm on debug.exe; after stripping out the 0x200-byte .EXE file header, to make it a raw binary, after extracting out the entry point address CS:IP and stack pointer address SS:SP from it (80x86 stacks grow down, so the stack segment is nominally SS:0 to SS:(SP-1)). The EXE for debug.exe has no relocations, so you're okay with that treating the code as raw binary.
But you won't get anything that's clearly recognizable, since the program is self-modifying - more precisely: self-extracting. You'll get a (barely) compressed code image (about 5/6 compression ratio) followed by a loader routine.
You have to run emulation on it, e.g. by running debug.exe on debug.exe to emulate its unpacking routine, to get it to extract itself, and then you dump the unpacked program image and disassemble that. There is a "relocation table" at the end of the loader routine, so it does actually have relocations in it - it's just that they're applied when the program unpacks itself, rather than by the OS when the EXE file is loaded.
And then you've just disassembled a disassembler that also happens to do CPU emulation, like Fake86 does - but only for the 8086. You'll have to make the absolute addresses relative (using the original relocation table as a guide), to make is re-assemblable. Once you do that, you can work on the source. The opcode table is in clear view (if you display it as text) - both when seen in the packed and unpacked versions of debug.exe.
There's also DosDebug up on GitHub. It handles everything up to "80586" (or Pentium") and "80686": it flags a generation "6" for some instructions.; e.g. the conditional "cmov" operations are handled by it, as well as their "fcmov" floating point versions. DosDebug is in 8086 assembly and is best-suited to compile with jwasm. You might be able to run nasm on it, I don't know. I never tried.
I might port the DAS disassembler to the x86, since items (a)-(f) are already incorporated into DAS's design. I've only ever ported it to the 8051, 6800, 6809 and 8080/8085 (and Z80) up to now; but the transition from 8085 to 8086 is relatively small. To that end, I might hack something out of Fake86. That's mostly abandonware, now, since the author replaced it by XTulator, as Fake86 was written when the programmer was relatively new to C. You might also be able to hack something directly out of DosDebug's opcode tables (their "instr.*" files).