关于汇编和计算机程序的问题

发布于 2024-11-29 15:21:49 字数 822 浏览 1 评论 0原文

我读了这篇文章: http://en.wikipedia.org/wiki/Assembly_language

它说:

以告诉 x86/IA-32 处理器的指令为例 将立即数 8 位值移入寄存器。二进制代码为 该指令是 10110 后跟一个 3 位标识符,其中 注册使用。 AL寄存器的标识符是000,所以 以下机器代码将数据加载到 AL 寄存器 01100001.[4]

<代码>10110000 01100001

它解释了如何更容易地将其写为:

MOV AL, 61h       ; Load AL with 97 decimal (61 hex)

现在这是我的问题。

那么,计算机程序/可执行文件只是二进制数据(0 和 1)?

当使用像 OllyDbg 这样的反汇编程序查看时,它只是尝试将这些 0 和 1 恢复为某种汇编(Intel?)语言,并且输出大部分是正确的?

如果我的 SSD 上有这个 10110000 01100001 程序,并且我编写了一个 C#/PHP/wtvr 应用程序来读取文件内容并将其作为位输出,我会看到这些确切的 10110000 01100001数字?

操作系统如何进行实际的“执行”?它如何告诉处理器“嘿,获取这些位并运行它们”?我可以直接在 C#/C++ 中执行此操作吗?

I read this article: http://en.wikipedia.org/wiki/Assembly_language

It says:

Take, for example, the instruction that tells an x86/IA-32 processor
to move an immediate 8-bit value into a register. The binary code for
this instruction is 10110 followed by a 3-bit identifier for which
register to use. The identifier for the AL register is 000, so the
following machine code loads the AL register with the data
01100001.[4]

10110000 01100001

It explains how it is easier to write it as:

MOV AL, 61h       ; Load AL with 97 decimal (61 hex)

Now here are my question(s).

So, computer programs/executables are just binary data (0's and 1's)?

When viewed with a disassembler like OllyDbg it just tries to revert those 0's and 1's back to some Assembly (Intel?) language and the output is mostly correct?

If I have this 10110000 01100001 program on my SSD and I write a C#/PHP/wtvr application that reads the contents of the file and output them as bits, will I see these exact 10110000 01100001 figures?

How does the operating system do the actual "execution"? How does it tell the processor that "hey, take these bits and run them"? Can I do that in C#/C++ directly?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

病毒体 2024-12-06 15:21:49

那么,计算机程序/可执行文件只是二进制数据(0 和 1)?

是的,比如图像、视频和其他数据。

当使用像 OllyDbg 这样的反汇编程序查看时,它只是尝试将这些 0 和 1 恢复为某种汇编(英特尔?)语言,并且输出大部分是正确的?

是的,在这种情况下,它始终是正确的,因为 mov al, 61h 始终组装为 0xB0 0x61(在 Intel 64 和 IA-32 架构软件开发人员手册 以及其他通常写为 B0 的地方61) 在 16 位、32 位和 64 位模式下。请注意,0xB0 0x61 = 0b10110000 0b01100001

您可以在卷 2A 中找到不同指令的编码。例如,这里是“B0+ rb MOV r8, imm8 E Valid Valid Move imm8 to r8”。第 3-644 页。

其他指令具有不同的含义,具体取决于它们是在 16/32 还是 64 位模式下解释。考虑这个短字节序列: 66 83 C0 04 41 80 C0 05

在 16 位模式下,它们的意思是:

00000000  6683C004          add eax,byte +0x4
00000004  41                inc cx
00000005  80C005            add al,0x5

在 32 位模式下,它们的意思是:

00000000  6683C004          add ax,byte +0x4
00000004  41                inc ecx
00000005  80C005            add al,0x5

最后在 64 位模式下:

00000000  6683C004          add ax,byte +0x4
00000004  4180C005          add r8b,0x5

所以指令不能总是在不知道上下文的情况下正确反汇编(这甚至没有考虑到代码以外的其他东西可以驻留在文本段中,并且代码可以做一些令人讨厌的事情,例如动态生成代码或自行修改)。

如果我的 SSD 上有这个 10110000 01100001 程序,并且我编写了一个 C#/PHP/wtvr 应用程序来读取文件内容并将其作为位输出,我会看到这些确切的 10110000 01100001 数字吗?

是的,如果应用程序包含 mov al, 61h 指令,则文件将包含字节 0xB00x61

操作系统如何进行实际的“执行”?它如何告诉处理器“嘿,获取这些位并运行它们”?我可以直接在 C#/C++ 中执行此操作吗?

将代码加载到内存中(并且内存已正确设置权限)后,它可以跳转到或调用它并运行它。您必须意识到一件事,即使操作系统只是另一个程序,它也是一个特殊的程序,因为它首先到达处理器!它以特殊的管理程序(或管理程序)模式运行,允许它执行普通(用户)程序不允许的操作。例如设置抢占式多任务处理,确保自动生成进程。

第一个处理器还负责唤醒多核/多处理器机器上的其他内核/处理器。请参阅这个这样的问题。

要调用直接在 C++ 中加载的代码(我认为在 C# 中不诉诸不安全/本机代码是不可能的)需要特定于平台的技巧。对于 Windows,您可能需要查看 VirtualProtect,以及Linux下的mprotect(2)。或者也许更实际地使用 Windows 的此过程 或 Linux 的 mmap(2)

So, computer programs/executables are just binary data (0's and 1's)?

Yes like images, videos and other data.

When viewed with a disassembler like OllyDbg it just tries to revert those 0's and 1's back to some Assembly (Intel?) language and the output is mostly correct?

Yes, in this exact case it will always be correct as mov al, 61h is always assembled to 0xB0 0x61 (in Intel 64 and IA-32 Architectures Software Developer's Manuals and other places usually written as B0 61) in 16-, 32- and 64-bit mode. Note that 0xB0 0x61 = 0b10110000 0b01100001.

You can find the encoding for different instructions in Volume 2A. For example here it is "B0+ rb MOV r8, imm8 E Valid Valid Move imm8 to r8." on page 3-644.

Other instructions have different meanings depend on whether they are interpreted in 16/32 or 64-bit mode. Consider this short sequence of bytes: 66 83 C0 04 41 80 C0 05

In 16-bit mode they mean:

00000000  6683C004          add eax,byte +0x4
00000004  41                inc cx
00000005  80C005            add al,0x5

In 32-bit mode they mean:

00000000  6683C004          add ax,byte +0x4
00000004  41                inc ecx
00000005  80C005            add al,0x5

And finally in 64-bit mode:

00000000  6683C004          add ax,byte +0x4
00000004  4180C005          add r8b,0x5

So the instructions cannot always be disassembled correctly without knowing the context (this is not even taking into account that other things than code can reside in the text segment and the code can do nasty stuff like generate code on the fly or self-modify).

If I have this 10110000 01100001 program on my SSD and I write a C#/PHP/wtvr application that reads the contents of the file and output them as bits, will I see these exact 10110000 01100001 figures?

Yes, in the sense that if the application contains the mov al, 61h instruction the file will contain the bytes 0xB0 and 0x61.

How does the operating system do the actual "execution"? How does it tell the processor that "hey, take these bits and run them"? Can I do that in C#/C++ directly?

After loading the code into memory (and the memory is correctly setup permission-wise) it can just jump to or call it and have it run. One thing you have to realize even though the operating system is just another program it is a special program since it got to the processor first! It runs in a special supervisor (or hypervisor) mode that allows it to things normal (user) programs aren't allowed to. Like set up preemptive multitasking that makes sure processes are automatically yielded.

The first processor is also responsible for waking up the other cores/processors on a multi-core/multi-processor machine. See this SO question.

To call code you load yourself directly in C++ (I don't think it is possible in C# without resorting to unsafe/native code) requires platform specific tricks. For Windows you probably want to look at VirtualProtect, and under linux mprotect(2). Or perhaps more realistically from a file which is the mapped using either this process for Windows or mmap(2) for linux.

怎樣才叫好 2024-12-06 15:21:49

这是很多问题:

是的,计算机程序/可执行文件只是二进制数据0/1。

是的,反汇编程序试图理解 0/1...并且它使用有关文件格式的附加知识(EXE 通常遵循 PE 规范,COM 是不同的规范等)以及二进制文件应该运行的操作系统以及可用的 API 等。

这两个字节(带有参数的一条指令)的读取方式完全相同...尽管这取决于它们所属的程序 - 正如所提到的,不同的文件类型遵循不同的规范。

通常操作系统会根据规范加载文件并处理其内容 - 例如重新排列某些内存区域等。然后,它将包含可执行代码的内存区域标记为可执行文件,并对所谓入口点的第一条指令的地址执行 JMP 或 CALL(这又根据当前的文件格式/规范而有所不同)。

在 C# 中,您不将汇编作为一种语言来处理,而是使用“字节代码”(IL 指令)...您可以通过框架方法等发出或加载这些内容。
在 C++ 中,如果您确实愿意,可以直接处理汇编,但这不可移植并且可能会变得复杂......所以您通常只在收益确实值得时才这样做(例如所需的性能提升 10 倍)。

that are a lot of questions:

Yes, computer programs/executables are just binary data 0/1s.

Yes, the disassembler tries to make sense of 0/1s... and it uses additional knowledge about the file format (EXE follows usually the PE spec, COM is different spec etc.) and the OS the binary is supposed to run on and the APIs available etc. .

These two bytes (one instruction with a parameter) would read exactly like that... although it depends on program they are part of - as mentioned different file types follow different specifications.

Usually the OS loads the file and processes its content according to the specification - for example rearranges some memory areas etc. . Then it marks the memory areas that contains executable code as executable and does a JMP or CALL to the address of the first instruction of the so-called entry-point (again this differs depending on the file format / specification at hand).

In C# you don't deal with assembly as a language but with "byte code" (IL instructions)... you can emit thos or load thos via Framework methos etc.
In c++ you could deal directly with assembly if you really want to but that is not portable and could get complicated... so you usually only do that when the gain is really worth it (like a needed performance boost by factor 10).

且行且努力 2024-12-06 15:21:49

那么,计算机程序/可执行文件只是二进制数据(0 和 1)?

是的。

当使用像 OllyDbg 这样的反汇编程序查看时,它只是尝试恢复
那些 0 和 1 回到某种汇编(英特尔?)语言和
输出大部分是正确的?

是的。除非二进制数据代表反汇编器设计的 CPU 代码,否则输出将完全正确,而不仅仅是“大部分”正确。

如果我的 SSD 上有这个 10110000 01100001 程序并且我写了一个
读取文件内容并输出的 C#/PHP/wtvr 应用程序
它们是位,我会看到这些确切的 10110000 01100001 数字吗?

是的

操作系统如何进行实际的“执行”?怎么样
告诉处理器“嘿,获取这些位并运行它们”?

操作系统只是一个像其他程序一样的程序,它是在处理器上执行的指令。简单地说,当操作系统执行代码时,它所做的只是跳转到代码所在位置的起始地址,因此处理器现在开始执行该位置处的任何代码。

我可以直接在 C#/C++ 中执行此操作吗?

不要忘记,C 在执行时被编译为汇编语言,并且在执行时,它与可以在给定 CPU 上运行的任何其他程序没有什么不同。是的,您可以使用内联汇编来跳转到给定的内存位置并执行代码。

So, computer programs/executables are just binary data (0's and 1's)?

YES.

When viewed with a disassembler like OllyDbg it just tries to revert
those 0's and 1's back to some Assembly (Intel?) language and the
output is mostly correct?

YES. Except that if the binary data represents code for the cpu the disassembler is designed for, the the output will be totally correct, not just 'mostly' correct.

If I have this 10110000 01100001 program on my SSD and I write a
C#/PHP/wtvr application that reads the contents of the file and output
them as bits, will I see these exact 10110000 01100001 figures?

YES

How does the operating system do the actual "execution"? How does it
tell the processor that "hey, take these bits and run them"?

The operating system is just a program like any other, it is instructions being executed on the processor. Simplistically when the operating system executes the code, all it does is jump to the start address of where the code is situated and hence the processor now begins executing whatever code is at that location.

Can I do that in C#/C++ directly?

Don't forget that C is compiled down to assembly language when it executes and at the point it is executed, it is no different to any other program that could run on a given CPU. Yes, you can use inline assembly for example to jump to a given memory location and execute the code.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文