关于汇编和计算机程序的问题

发布于 2024-11-29 15:21:49 字数 822 浏览 5 评论 0原文

我读了这篇文章： http://en.wikipedia.org/wiki/Assembly_language

它说：

以告诉 x86/IA-32 处理器的指令为例将立即数 8 位值移入寄存器。二进制代码为该指令是 10110 后跟一个 3 位标识符，其中注册使用。 AL寄存器的标识符是000，所以以下机器代码将数据加载到 AL 寄存器 01100001.[4]
<代码>10110000 01100001

它解释了如何更容易地将其写为：

MOV AL, 61h       ; Load AL with 97 decimal (61 hex)

现在这是我的问题。

那么，计算机程序/可执行文件只是二进制数据（0 和 1）？

当使用像 OllyDbg 这样的反汇编程序查看时，它只是尝试将这些 0 和 1 恢复为某种汇编（Intel？）语言，并且输出大部分是正确的？

如果我的 SSD 上有这个 10110000 01100001 程序，并且我编写了一个 C#/PHP/wtvr 应用程序来读取文件内容并将其作为位输出，我会看到这些确切的 10110000 01100001数字？

操作系统如何进行实际的“执行”？它如何告诉处理器“嘿，获取这些位并运行它们”？我可以直接在 C#/C++ 中执行此操作吗？

原文

I read this article: http://en.wikipedia.org/wiki/Assembly_language

It says:

Take, for example, the instruction that tells an x86/IA-32 processor
to move an immediate 8-bit value into a register. The binary code for
this instruction is 10110 followed by a 3-bit identifier for which
register to use. The identifier for the AL register is 000, so the
following machine code loads the AL register with the data
01100001.[4]
10110000 01100001

It explains how it is easier to write it as:

MOV AL, 61h       ; Load AL with 97 decimal (61 hex)

Now here are my question(s).

So, computer programs/executables are just binary data (0's and 1's)?

When viewed with a disassembler like OllyDbg it just tries to revert those 0's and 1's back to some Assembly (Intel?) language and the output is mostly correct?

If I have this 10110000 01100001 program on my SSD and I write a C#/PHP/wtvr application that reads the contents of the file and output them as bits, will I see these exact 10110000 01100001 figures?

How does the operating system do the actual "execution"? How does it tell the processor that "hey, take these bits and run them"? Can I do that in C#/C++ directly?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

病毒体 2024-12-06 15:21:49

那么，计算机程序/可执行文件只是二进制数据（0 和 1）？

是的，比如图像、视频和其他数据。

当使用像 OllyDbg 这样的反汇编程序查看时，它只是尝试将这些 0 和 1 恢复为某种汇编（英特尔？）语言，并且输出大部分是正确的？

是的，在这种情况下，它始终是正确的，因为 mov al, 61h 始终组装为 0xB0 0x61（在 Intel 64 和 IA-32 架构软件开发人员手册以及其他通常写为 B0 的地方61) 在 16 位、32 位和 64 位模式下。请注意，0xB0 0x61 = 0b10110000 0b01100001。

您可以在卷 2A 中找到不同指令的编码。例如，这里是“B0+ rb MOV r8, imm8 E Valid Valid Move imm8 to r8”。第 3-644 页。

其他指令具有不同的含义，具体取决于它们是在 16/32 还是 64 位模式下解释。考虑这个短字节序列： 66 83 C0 04 41 80 C0 05

在 16 位模式下，它们的意思是：

00000000  6683C004          add eax,byte +0x4
00000004  41                inc cx
00000005  80C005            add al,0x5

在 32 位模式下，它们的意思是：

00000000  6683C004          add ax,byte +0x4
00000004  41                inc ecx
00000005  80C005            add al,0x5

最后在 64 位模式下：

00000000  6683C004          add ax,byte +0x4
00000004  4180C005          add r8b,0x5

所以指令不能总是在不知道上下文的情况下正确反汇编（这甚至没有考虑到代码以外的其他东西可以驻留在文本段中，并且代码可以做一些令人讨厌的事情，例如动态生成代码或自行修改）。

如果我的 SSD 上有这个 10110000 01100001 程序，并且我编写了一个 C#/PHP/wtvr 应用程序来读取文件内容并将其作为位输出，我会看到这些确切的 10110000 01100001 数字吗？

是的，如果应用程序包含 mov al, 61h 指令，则文件将包含字节 0xB0 和 0x61。

操作系统如何进行实际的“执行”？它如何告诉处理器“嘿，获取这些位并运行它们”？我可以直接在 C#/C++ 中执行此操作吗？

将代码加载到内存中（并且内存已正确设置权限）后，它可以跳转到或调用它并运行它。您必须意识到一件事，即使操作系统只是另一个程序，它也是一个特殊的程序，因为它首先到达处理器！它以特殊的管理程序（或管理程序）模式运行，允许它执行普通（用户）程序不允许的操作。例如设置抢占式多任务处理，确保自动生成进程。

第一个处理器还负责唤醒多核/多处理器机器上的其他内核/处理器。请参阅这个这样的问题。

要调用直接在 C++ 中加载的代码（我认为在 C# 中不诉诸不安全/本机代码是不可能的）需要特定于平台的技巧。对于 Windows，您可能需要查看 VirtualProtect，以及Linux下的mprotect(2)。或者也许更实际地使用 Windows 的此过程或 Linux 的 mmap(2)。

So, computer programs/executables are just binary data (0's and 1's)?

Yes like images, videos and other data.

When viewed with a disassembler like OllyDbg it just tries to revert those 0's and 1's back to some Assembly (Intel?) language and the output is mostly correct?

Yes, in this exact case it will always be correct as mov al, 61h is always assembled to 0xB0 0x61 (in Intel 64 and IA-32 Architectures Software Developer's Manuals and other places usually written as B0 61) in 16-, 32- and 64-bit mode. Note that 0xB0 0x61 = 0b10110000 0b01100001.

You can find the encoding for different instructions in Volume 2A. For example here it is "B0+ rb MOV r8, imm8 E Valid Valid Move imm8 to r8." on page 3-644.

Other instructions have different meanings depend on whether they are interpreted in 16/32 or 64-bit mode. Consider this short sequence of bytes: 66 83 C0 04 41 80 C0 05

In 16-bit mode they mean:

00000000  6683C004          add eax,byte +0x4
00000004  41                inc cx
00000005  80C005            add al,0x5

In 32-bit mode they mean:

00000000  6683C004          add ax,byte +0x4
00000004  41                inc ecx
00000005  80C005            add al,0x5

And finally in 64-bit mode:

00000000  6683C004          add ax,byte +0x4
00000004  4180C005          add r8b,0x5

So the instructions cannot always be disassembled correctly without knowing the context (this is not even taking into account that other things than code can reside in the text segment and the code can do nasty stuff like generate code on the fly or self-modify).

If I have this 10110000 01100001 program on my SSD and I write a C#/PHP/wtvr application that reads the contents of the file and output them as bits, will I see these exact 10110000 01100001 figures?

Yes, in the sense that if the application contains the mov al, 61h instruction the file will contain the bytes 0xB0 and 0x61.

How does the operating system do the actual "execution"? How does it tell the processor that "hey, take these bits and run them"? Can I do that in C#/C++ directly?

After loading the code into memory (and the memory is correctly setup permission-wise) it can just jump to or call it and have it run. One thing you have to realize even though the operating system is just another program it is a special program since it got to the processor first! It runs in a special supervisor (or hypervisor) mode that allows it to things normal (user) programs aren't allowed to. Like set up preemptive multitasking that makes sure processes are automatically yielded.

The first processor is also responsible for waking up the other cores/processors on a multi-core/multi-processor machine. See this SO question.

To call code you load yourself directly in C++ (I don't think it is possible in C# without resorting to unsafe/native code) requires platform specific tricks. For Windows you probably want to look at VirtualProtect, and under linux mprotect(2). Or perhaps more realistically from a file which is the mapped using either this process for Windows or mmap(2) for linux.

回复收藏 0 原文