如何手动读取/写入.exe机器代码?

发布于 2024-07-17 08:29:14 字数 497 浏览 5 评论 0原文

我不太熟悉编译器的魔法。 对我来说,将人类可读的代码(或不真正可读的汇编指令)转换为机器代码的行为就像火箭科学与魔法的结合。

我将把这个问题的主题缩小到 Win32 可执行文件 (.exe)。 当我在专门的查看器中打开这些文件时,我可以找到分散在各个地方的字符串(通常每个字符 16b),但其余的只是垃圾。 我认为不可读的部分(大多数)是机器代码(或者可能是资源,例如图像等......)。

有没有直接读取机器码的方法? 将exe作为文件流打开并逐字节读取它,如何将这些单独的字节转换为汇编语言? 这些指令字节和汇编指令之间是否存在直接映射?

.exe是怎么写的? 每条指令四个字节? 更多的? 较少的? 我注意到有些应用程序可以像这样创建可执行文件:例如,在 ACD See 中,您可以将一系列图像导出到幻灯片中。 但这不一定是 SWF 幻灯片,ACD See 还能够生成可执行的演示文稿。 这是怎么做到的?

我怎样才能理解 EXE 文件内部发生的事情?

I am not well acquainted to the compiler magic. The act of transforming human-readable code (or the not really readable Assembly instructions) into machine code is, for me, rocket science combined with sorcery.

I will narrow down the subject of this question to Win32 executables (.exe). When I open these files up in a specialized viewer, I can find strings (usually 16b per character) scattered at various places, but the rest is just garbage. I suppose the unreadable part (majority) is the machine code (or maybe resources, such as images etc...).

Is there any straightforward way of reading the machine code? Opening the exe as a file stream and reading it byte by byte, how could one turn these individual bytes into Assembly? Is there a straightforward mapping between these instruction bytes and the Assembly instruction?

How is the .exe written? Four bytes per instruction? More? Less? I have noticed some applications can create executable files just like that: for example, in ACD See you can export a series of images into a slideshow. But this does not necessarily have to be a SWF slideshow, ACD See is also capable of producing EXEcutable presentations. How is that done?

How can I understand what goes on inside an EXE file?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(13

抚笙 2024-07-24 08:29:14

OllyDbg 是一个很棒的工具,它可以将 EXE 反汇编成可读的指令,并允许您逐条执行指令。 它还告诉您程序使用哪些 API 函数,如果可能的话,还告诉您它提供的参数(只要在堆栈上找到参数)。

一般来说,CPU指令的长度是可变的,有的为一字节,有的为两个字节,有的为三个字节,有的为四个字节等。这主要取决于指令期望的数据类型。 有些指令是通用的,例如“mov”,它告诉 CPU 将数据从 CPU 寄存器移动到内存中的某个位置,反之亦然。 实际上,有许多不同的“mov”指令,用于处理8位、16位、32位数据的指令,用于从不同寄存器移动数据的指令等等。

您可以阅读 Paul Carter 博士的 PC 汇编语言教程,这是一本免费的入门级书籍,其中讨论了组装以及 Intel 386 CPU 的运行方式。 其中大部分甚至适用于现代消费类英特尔 CPU。

EXE 格式特定于 Windows。 入口点(即第一条可执行指令)通常位于 EXE 文件中的同一位置。 一次性解释清楚这一切有点困难,但我提供的资源至少应该有助于满足您的一些好奇心! :)

OllyDbg is an awesome tool that disassembles an EXE into readable instructions and allows you to execute the instructions one-by-one. It also tells you what API functions the program uses and if possible, the arguments that it provides (as long as the arguments are found on the stack).

Generally speaking, CPU instructions are of variable length, some are one byte, others are two, some three, some four etc. It mostly depends on the kind of data that the instruction expects. Some instructions are generalised, like "mov" which tells the CPU to move data from a CPU register to a place in memory, or vice versa. In reality, there are many different "mov" instructions, ones for handling 8-bit, 16-bit, 32-bit data, ones for moving data from different registers and so on.

You could pick up Dr. Paul Carter's PC Assembly Language Tutorial which is a free entry level book that talks about assembly and how the Intel 386 CPU operates. Most of it is applicable even to modern day consumer Intel CPUs.

The EXE format is specific to Windows. The entry-point (i.e. the first executable instruction) is usually found at the same place within the EXE file. It's all kind of difficult to explain all at once, but the resources I've provided should help cure at least some of your curiosity! :)

春夜浅 2024-07-24 08:29:14

您需要一个反汇编程序,它将机器代码转换为汇编语言。 此维基百科链接描述了该过程并提供了免费反汇编程序的链接。 当然,正如您所说您不懂汇编语言,这可能没有提供太多信息 - 您到底想在这里做什么?

You need a disassembler which will turn the machine code into assembly language. This Wikipedia link describes the process and provides links to free disassemblers. Of course, as you say you don't understand assembly language, this may not be very informative - what exactly are you trying to do here?

最单纯的乌龟 2024-07-24 08:29:14

您可以从命令行使用调试,但这很难。

C:\WINDOWS>debug taskman.exe
-u
0D69:0000 0E            PUSH    CS
0D69:0001 1F            POP     DS
0D69:0002 BA0E00        MOV     DX,000E
0D69:0005 B409          MOV     AH,09
0D69:0007 CD21          INT     21
0D69:0009 B8014C        MOV     AX,4C01
0D69:000C CD21          INT     21
0D69:000E 54            PUSH    SP
0D69:000F 68            DB      68
0D69:0010 69            DB      69
0D69:0011 7320          JNB     0033
0D69:0013 7072          JO      0087
0D69:0015 6F            DB      6F
0D69:0016 67            DB      67
0D69:0017 7261          JB      007A
0D69:0019 6D            DB      6D
0D69:001A 206361        AND     [BP+DI+61],AH
0D69:001D 6E            DB      6E
0D69:001E 6E            DB      6E
0D69:001F 6F            DB      6F

You can use debug from the command line, but that's hard.

C:\WINDOWS>debug taskman.exe
-u
0D69:0000 0E            PUSH    CS
0D69:0001 1F            POP     DS
0D69:0002 BA0E00        MOV     DX,000E
0D69:0005 B409          MOV     AH,09
0D69:0007 CD21          INT     21
0D69:0009 B8014C        MOV     AX,4C01
0D69:000C CD21          INT     21
0D69:000E 54            PUSH    SP
0D69:000F 68            DB      68
0D69:0010 69            DB      69
0D69:0011 7320          JNB     0033
0D69:0013 7072          JO      0087
0D69:0015 6F            DB      6F
0D69:0016 67            DB      67
0D69:0017 7261          JB      007A
0D69:0019 6D            DB      6D
0D69:001A 206361        AND     [BP+DI+61],AH
0D69:001D 6E            DB      6E
0D69:001E 6E            DB      6E
0D69:001F 6F            DB      6F
如梦初醒的夏天 2024-07-24 08:29:14

您看到的可执行文件是微软的PE(可移植可执行文件)格式。 它本质上是一个容器,其中保存一些有关程序的操作系统特定数据,并且程序数据本身分为几个部分。 例如,代码、资源、静态数据存储在单独的部分中。

该部分的格式取决于其中的内容。 代码部分保存根据可执行目标体系结构的机器代码。 在最常见的情况下,对于 Microsoft PE 二进制文件,这是 Intel x86 或 AMD-64(与 EM64T 相同)。 机器码的格式是 CISC,起源于 8086 及更早的版本。 CISC 的一个重要方面是它的指令大小不是恒定的,你必须从正确的位置开始阅读才能从中获得有价值的东西。 Intel 发布了有关 x86/x64 指令集的优秀手册。

您可以使用反汇编程序直接查看机器码。 结合手册你大多数时候都可以猜出源代码。

然后是 MSIL EXE:包含 Microsoft 中间语言的 .NET 可执行文件,它们不包含特定于机器的代码,而是包含 .NET CIL 代码。 其规范可在 ECMA 上在线获取。

这些可以使用 Reflector 等工具查看。

The executable file you see is Microsofts PE (Portable Executable) format. It is essentially a container, which holds some operating system specific data about a program and the program data itself split into several sections. For example code, resources, static data are stored in seperate sections.

The format of the section depends on what is in it. The code section holds the machine code according to the executable target architecture. In the most common cases this is Intel x86 or AMD-64 (same as EM64T) for Microsoft PE binaries. The format of the machine code is CISC and originates back to the 8086 and earlier. The important aspect of CISC is that its instruction size is not constant, you have to start reading at the right place to get something valuable out of it. Intel publishes good manuals on the x86/x64 instruction set.

You can use a disassembler to view the machine code directly. In combination with the manuals you can guess the source code most of the time.

And then there's MSIL EXE: The .NET executables holding Microsofts Intermediate Language, these do not contain machine specific code, but .NET CIL code. The specifications for that are available online at the ECMA.

These can be viewed with a tool such as Reflector.

这个俗人 2024-07-24 08:29:14

EXE 文件的内容在可移植可执行文件中描述。 它包含代码、数据以及操作系统如何加载文件的指令。

机器代码和汇编之间存在 1:1 的映射。 反汇编程序将执行相反的操作。

i386 上的每条指令没有固定的字节数。 有些是单个字节,有些则更长。

The contents of the EXE file are described in Portable Executable. It contains code, data, and instructions to OS on how to load the file.

There is an 1:1 mapping between machine code and assembly. A disassembler program will perform the reverse operation.

There isn't a fixed number of bytes per instruction on i386. Some are a single byte, some are much longer.

好多鱼好多余 2024-07-24 08:29:14

就这个问题而言,有人仍然读过类似的内容
光盘21?

我记得桑德拉·布洛克在一个节目中,实际上阅读了一屏十六进制数字并弄清楚该程序的作用。 有点像当前版本的读取 Matrix 代码。

如果你确实读过像 CD 21 这样的东西,你如何记住不同的组合?

Just relating to this question, anyone still read things like
CD 21?

I remembered Sandra Bullock in one show, actually reading a screenful of hex numbers and figure out what the program does. Sort of like the current version of reading Matrix code.

if you do read stuff like CD 21, how do you remember the different various combinations?

橘和柠 2024-07-24 08:29:14

MSDN 上的 Win32 exe 格式

我建议了解一下 Windows C 源代码并在 Visual Studio 中构建并开始调试它。 切换到反汇编视图并单步执行命令。 您可以看到 C 代码如何编译为机器代码 - 并观看它逐步运行。

Win32 exe format on MSDN

I'd suggest taking an bit of Windows C source code and build and start debugging it in Visual Studio. Switch to the disassembly view and step over the commands. You can see how the C code has been compiled into machine code - and watch it run step-by-step.

如梦亦如幻 2024-07-24 08:29:14

如果它对你来说看起来很陌生,我认为调试器或反汇编器不会有帮助 - 你需要首先学习汇编程序编程; 研究处理器的架构(可从英特尔下载大量文档)。 然后,由于大多数机器代码是由编译器生成的,因此您需要了解编译器如何生成代码 - 编写大量小程序然后反汇编它们以查看 C/C++ 变成什么的最简单方法。

几本书可以帮助您理解:-

If it's as foreign to you as it seems, I don't think a debugger or disassembler is going to help - you need to learn assembler programming first; study the architecture of the processor (plenty of documentation downloadable from Intel). And then since most machine code is generated by compilers, you'll need to understand how compilers generate code - the simplest way to write lots of small programs and then disassemble them to see what your C/C++ is turned into.

A couple of books that'll help you understand:-

醉酒的小男人 2024-07-24 08:29:14

要获得一个想法,请在一些有趣的代码上设置断点,然后转到 CPU 窗口。

如果您对更多感兴趣,使用 -al 参数使用 Free Pascal 编译短片段会更容易。

FPC 允许使用 -A 参数以多种汇编器格式(TASM、MASM、GAS)输出生成的汇编器,并且您可以将原始 pascal 代码交错在注释(以及更多)中,以便于交叉引用。

因为它是编译器生成的汇编程序,而不是反汇编的 .exe 汇编程序,所以它更具象征意义并且更容易理解。

To get an idea, set a breakpoint on some interesting code, and then go to the CPU window.

If you are interested in more, it is easier to compile short fragments with Free Pascal using the -al parameter.

FPC allows to output the generated assembler in a multitude of assembler formats (TASM,MASM,GAS ) using the -A parameter, and you can have the original pascal code interleaved in comments (and more) for easy crossreference.

Because it is compiler generated assembler, as opposed to assembler from disassembled .exe, it is more symbolic and easier to follow.

盗心人 2024-07-24 08:29:14

熟悉低级汇编(我的意思是低级汇编,而不是“宏”和那个公牛)可能是必须的。 如果您确实想直接读取原始机器代码本身,通常您会使用十六进制编辑器。 然而,为了理解指令的作用,大多数人会使用反汇编程序将其转换为适当的汇编指令。 如果您是少数想要了解机器语言本身的人之一,我想您会想要 英特尔® 64 和 IA-32 架构软件开发人员手册第 2 卷专门介绍了指令集,它与您有关如何读取机器代码本身以及汇编与其关联的查询相关。

Familiarity with low level assembly (and I mean low level assembly, not "macros" and that bull) is probably a must. If you really want to read the raw machine code itself directly, usually you would use a hex editor for that. In order to understand what the instructions do, however, most people would use a disassembler to convert that into the appropriate assembly instructions. If you're one of the minority who wants to understand the machine language itself, I think you'd want the Intel® 64 and IA-32 Architectures Software Developer's Manuals. Volume 2 specifically covers the instruction set, which relates to your query about how to read machine code itself and how assembly relates to it.

灯角 2024-07-24 08:29:14

你的好奇心和理解程度都和我当时的情况一模一样。 我强烈推荐代码:计算机硬件和软件的隐藏语言。 这不会回答您在这里提出的所有问题,但它将揭示计算机的一些完全黑魔法的方面。 这是一本厚厚的书,但可读性很强。

Both your curiosity and your level of understanding is exactly where I was at one point. I highly recommend Code: The Hidden Language of Computer Hardware and Software. This will not answer all of the questions you ask here but it will shed light on some of the utterly black magic aspects of computers. It's a thick book but highly readable.

憧憬巴黎街头的黎明 2024-07-24 08:29:14

ACD See 可能利用了以下事实:.EXE 文件不会对文件长度或任何超出文件预期部分长度的内容进行错误检查。 因此,您可以创建一个 .EXE 文件,该文件将打开自身并将超出给定点的所有内容作为数据加载。 这很有用,因为您可以通过将数据附加到适当编写的 .EXE 的末尾来创建一个可处理给定数据集的 .EXE

(我不知道 ACD See 到底是什么,所以请大颗粒地考虑一下)盐,但我确实知道一些程序是这样生成的。)

ACD See is probably taking advantage of the fact that .EXE files do no error checking on file length or anything beyond the length of the expected portion of the file. Because of this, you can make an .EXE file that will open its self and load everything beyond a given point as data. This is useful because you can then make a .EXE that works on a given set of data by just tacking that data on the end of a suitably written .EXE

(I have no idea what exactly ACD See is so take that with a big grain of salt but I do know that some program are generated that way.)

許願樹丅啲祈禱 2024-07-24 08:29:14

每条指令都是机器代码,保存在 CPU 内的特殊内存区域中。 早期的英特尔书籍给出了其指令的机器代码,因此应该尝试获取此类书籍以理解这一点。 显然,今天的机器代码并不容易获得。 最好是一个可以将十六进制反转为机器代码的程序。 或者手动执行_!!
乏味

Every instruction is in machine code kept in a special memory area within the cpu. EARLY INTEL books gave the machine code for their instructions, so one should try to obtain such books so as to understand this. Obviously today machine codeis not easily available. What would be nice is a program which can reverse hex to machine code. Or do it manually _!!
tedious

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文