关于汇编和计算机程序的问题
我读了这篇文章: http://en.wikipedia.org/wiki/Assembly_language
它说:
以告诉 x86/IA-32 处理器的指令为例 将立即数 8 位值移入寄存器。二进制代码为 该指令是 10110 后跟一个 3 位标识符,其中 注册使用。 AL寄存器的标识符是000,所以 以下机器代码将数据加载到 AL 寄存器 01100001.[4]
<代码>10110000 01100001
它解释了如何更容易地将其写为:
MOV AL, 61h ; Load AL with 97 decimal (61 hex)
现在这是我的问题。
那么,计算机程序/可执行文件只是二进制数据(0 和 1)?
当使用像 OllyDbg 这样的反汇编程序查看时,它只是尝试将这些 0 和 1 恢复为某种汇编(Intel?)语言,并且输出大部分是正确的?
如果我的 SSD 上有这个 10110000 01100001
程序,并且我编写了一个 C#/PHP/wtvr 应用程序来读取文件内容并将其作为位输出,我会看到这些确切的 10110000 01100001数字?
操作系统如何进行实际的“执行”?它如何告诉处理器“嘿,获取这些位并运行它们”?我可以直接在 C#/C++ 中执行此操作吗?
I read this article: http://en.wikipedia.org/wiki/Assembly_language
It says:
Take, for example, the instruction that tells an x86/IA-32 processor
to move an immediate 8-bit value into a register. The binary code for
this instruction is 10110 followed by a 3-bit identifier for which
register to use. The identifier for the AL register is 000, so the
following machine code loads the AL register with the data
01100001.[4]
10110000 01100001
It explains how it is easier to write it as:
MOV AL, 61h ; Load AL with 97 decimal (61 hex)
Now here are my question(s).
So, computer programs/executables are just binary data (0's and 1's)?
When viewed with a disassembler like OllyDbg it just tries to revert those 0's and 1's back to some Assembly (Intel?) language and the output is mostly correct?
If I have this 10110000 01100001
program on my SSD and I write a C#/PHP/wtvr application that reads the contents of the file and output them as bits, will I see these exact 10110000 01100001
figures?
How does the operating system do the actual "execution"? How does it tell the processor that "hey, take these bits and run them"? Can I do that in C#/C++ directly?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
是的,比如图像、视频和其他数据。
是的,在这种情况下,它始终是正确的,因为
mov al, 61h
始终组装为0xB0 0x61
(在 Intel 64 和 IA-32 架构软件开发人员手册 以及其他通常写为B0 的地方61
) 在 16 位、32 位和 64 位模式下。请注意,0xB0 0x61
=0b10110000 0b01100001
。您可以在卷 2A 中找到不同指令的编码。例如,这里是“B0+ rb MOV r8, imm8 E Valid Valid Move imm8 to r8”。第 3-644 页。
其他指令具有不同的含义,具体取决于它们是在 16/32 还是 64 位模式下解释。考虑这个短字节序列:
66 83 C0 04 41 80 C0 05
在 16 位模式下,它们的意思是:
在 32 位模式下,它们的意思是:
最后在 64 位模式下:
所以指令不能总是在不知道上下文的情况下正确反汇编(这甚至没有考虑到代码以外的其他东西可以驻留在文本段中,并且代码可以做一些令人讨厌的事情,例如动态生成代码或自行修改)。
是的,如果应用程序包含
mov al, 61h
指令,则文件将包含字节0xB0
和0x61
。将代码加载到内存中(并且内存已正确设置权限)后,它可以跳转到或调用它并运行它。您必须意识到一件事,即使操作系统只是另一个程序,它也是一个特殊的程序,因为它首先到达处理器!它以特殊的管理程序(或管理程序)模式运行,允许它执行普通(用户)程序不允许的操作。例如设置抢占式多任务处理,确保自动生成进程。
第一个处理器还负责唤醒多核/多处理器机器上的其他内核/处理器。请参阅这个这样的问题。
要调用直接在 C++ 中加载的代码(我认为在 C# 中不诉诸不安全/本机代码是不可能的)需要特定于平台的技巧。对于 Windows,您可能需要查看
VirtualProtect
,以及Linux下的mprotect(2)
。或者也许更实际地使用 Windows 的此过程 或 Linux 的mmap(2)
。Yes like images, videos and other data.
Yes, in this exact case it will always be correct as
mov al, 61h
is always assembled to0xB0 0x61
(in Intel 64 and IA-32 Architectures Software Developer's Manuals and other places usually written asB0 61
) in 16-, 32- and 64-bit mode. Note that0xB0 0x61
=0b10110000 0b01100001
.You can find the encoding for different instructions in Volume 2A. For example here it is "B0+ rb MOV r8, imm8 E Valid Valid Move imm8 to r8." on page 3-644.
Other instructions have different meanings depend on whether they are interpreted in 16/32 or 64-bit mode. Consider this short sequence of bytes:
66 83 C0 04 41 80 C0 05
In 16-bit mode they mean:
In 32-bit mode they mean:
And finally in 64-bit mode:
So the instructions cannot always be disassembled correctly without knowing the context (this is not even taking into account that other things than code can reside in the text segment and the code can do nasty stuff like generate code on the fly or self-modify).
Yes, in the sense that if the application contains the
mov al, 61h
instruction the file will contain the bytes0xB0
and0x61
.After loading the code into memory (and the memory is correctly setup permission-wise) it can just jump to or call it and have it run. One thing you have to realize even though the operating system is just another program it is a special program since it got to the processor first! It runs in a special supervisor (or hypervisor) mode that allows it to things normal (user) programs aren't allowed to. Like set up preemptive multitasking that makes sure processes are automatically yielded.
The first processor is also responsible for waking up the other cores/processors on a multi-core/multi-processor machine. See this SO question.
To call code you load yourself directly in C++ (I don't think it is possible in C# without resorting to unsafe/native code) requires platform specific tricks. For Windows you probably want to look at
VirtualProtect
, and under linuxmprotect(2)
. Or perhaps more realistically from a file which is the mapped using either this process for Windows ormmap(2)
for linux.这是很多问题:
是的,计算机程序/可执行文件只是二进制数据0/1。
是的,反汇编程序试图理解 0/1...并且它使用有关文件格式的附加知识(EXE 通常遵循 PE 规范,COM 是不同的规范等)以及二进制文件应该运行的操作系统以及可用的 API 等。
这两个字节(带有参数的一条指令)的读取方式完全相同...尽管这取决于它们所属的程序 - 正如所提到的,不同的文件类型遵循不同的规范。
通常操作系统会根据规范加载文件并处理其内容 - 例如重新排列某些内存区域等。然后,它将包含可执行代码的内存区域标记为可执行文件,并对所谓入口点的第一条指令的地址执行 JMP 或 CALL(这又根据当前的文件格式/规范而有所不同)。
在 C# 中,您不将汇编作为一种语言来处理,而是使用“字节代码”(IL 指令)...您可以通过框架方法等发出或加载这些内容。
在 C++ 中,如果您确实愿意,可以直接处理汇编,但这不可移植并且可能会变得复杂......所以您通常只在收益确实值得时才这样做(例如所需的性能提升 10 倍)。
that are a lot of questions:
Yes, computer programs/executables are just binary data 0/1s.
Yes, the disassembler tries to make sense of 0/1s... and it uses additional knowledge about the file format (EXE follows usually the PE spec, COM is different spec etc.) and the OS the binary is supposed to run on and the APIs available etc. .
These two bytes (one instruction with a parameter) would read exactly like that... although it depends on program they are part of - as mentioned different file types follow different specifications.
Usually the OS loads the file and processes its content according to the specification - for example rearranges some memory areas etc. . Then it marks the memory areas that contains executable code as executable and does a JMP or CALL to the address of the first instruction of the so-called entry-point (again this differs depending on the file format / specification at hand).
In C# you don't deal with assembly as a language but with "byte code" (IL instructions)... you can emit thos or load thos via Framework methos etc.
In c++ you could deal directly with assembly if you really want to but that is not portable and could get complicated... so you usually only do that when the gain is really worth it (like a needed performance boost by factor 10).
是的。
是的。除非二进制数据代表反汇编器设计的 CPU 代码,否则输出将完全正确,而不仅仅是“大部分”正确。
是的
操作系统只是一个像其他程序一样的程序,它是在处理器上执行的指令。简单地说,当操作系统执行代码时,它所做的只是跳转到代码所在位置的起始地址,因此处理器现在开始执行该位置处的任何代码。
不要忘记,C 在执行时被编译为汇编语言,并且在执行时,它与可以在给定 CPU 上运行的任何其他程序没有什么不同。是的,您可以使用内联汇编来跳转到给定的内存位置并执行代码。
YES.
YES. Except that if the binary data represents code for the cpu the disassembler is designed for, the the output will be totally correct, not just 'mostly' correct.
YES
The operating system is just a program like any other, it is instructions being executed on the processor. Simplistically when the operating system executes the code, all it does is jump to the start address of where the code is situated and hence the processor now begins executing whatever code is at that location.
Don't forget that C is compiled down to assembly language when it executes and at the point it is executed, it is no different to any other program that could run on a given CPU. Yes, you can use inline assembly for example to jump to a given memory location and execute the code.