为什么 EXE 不是二进制的?
为什么如果你在十六进制编辑器中打开一个EXE,你会看到各种各样的东西。如果计算机只理解二进制,那么文件中不应该只有 2 个可能的符号吗?谢谢
Why is it that if you open up an EXE in a hex editor, you will see all sorts of things. If computers only understand binary then shouldn't there only be 2 possible symbols seen in the file? Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
您将内容与表示混淆了。计算机上的每个文件都可以用二进制(1 和 0)表示,实际上这就是它通常存储在磁盘(磁性粒子对齐)或 RAM(电荷)上的方式。
您正在使用“十六进制编辑器”查看 exe,它使用十六进制数字表示内容。这样做是因为十六进制比二进制更容易理解和导航(将“FA”与“11111010”进行比较)。
因此,十六进制符号“C0”代表与二进制“11000000”相同的值,“C1”==“11000001”,“C2”==“11000010”,依此类推。
You're confusing content with representation. Every single file on your computer can be represented with binary (1s and 0s), and indeed that's how it's generally stored on disk (alignment of magnetic particles) or RAM (charge).
You're viewing your exe with a "hex editor", which represents the content using hexadecimal numbers. It does this because it's easier to understand and navigate hex than binary (compare "FA" to "11111010").
So the hexadecimal symbol "C0" represents the same value as the binary "11000000", "C1" == "11000001", "C2" == "11000010", and so on.
十六进制值在内存中被解释为二进制值。该软件只是让它对人类来说更具可读性。
0000 = 0
0001 = 1
0010 = 2
0011 = 3
0100 = 4
0101 = 5
0110 = 6
0111 = 7
1000 = 8
1001 = 9
1010 = 10 A
1011 = 11 B
1100 = 12 C
1101 = 13 天
1110 = 14E
1111 = 15 楼
The hexadecimal values are interpreted binary values in memory. The software only make it a bit more readable to human beings.
0000 = 0
0001 = 1
0010 = 2
0011 = 3
0100 = 4
0101 = 5
0110 = 6
0111 = 7
1000 = 8
1001 = 9
1010 = 10 A
1011 = 11 B
1100 = 12 C
1101 = 13 D
1110 = 14 E
1111 = 15 F
计算机不仅仅只理解二进制,这是一种误解。是的,在最低、最低、最低级别,数字计算机中的数据是一系列 1 和 0。但计算机 CPU 将这些位组合成字节、字、dwords、qwords 等。现代CPU处理的基本单位是双字或四字,而不是位。这就是它们被称为 32 位或 64 位处理器的原因。如果您想让它们使用单个位,您几乎最终会包含 31 或 63 个无关位。 (当你开始处理标志寄存器时,它会变得有点模糊。)
数字计算机从 8 位处理器开始真正发挥作用,因此十六进制成为一种非常有用的显示格式,因为它简洁地表示一个字节(8 位)两个字符。您使用的是十六进制编辑器,因此它会向您显示十六进制,并且由于这种早期的字节方向,它会每 8 位显示两个字符。不过,这主要是一个展示的东西;尽管文件系统通常以实际数据的字节粒度工作(并且对于存储分配粒度而言,大得多的块 - 几乎总是 4k),但它没有理由不能每 4 位显示一个字符或每 16 位显示 4 个字符或更多)。
Computers don't only understand binary, that's a misconception. At the very lowest, lowest, lowest level, yes, data in digital computers is a series of 1s and 0s. But computer CPUs group those bits together into bytes, words, dwords, qwords, etc. The basic unit dealt with by a modern CPU is a dword or a qword, not a bit. That's why they're called 32-bit or 64-bit processors. If you want to get them to work with a single bit, you pretty much end up including 31 or 63 extraneous bits with it. (It gets a bit blurry when you start dealing with flag registers.)
Digital computers really came into their own as of 8-bit processors, so hexadecimal became a very useful display format as it succinctly represents a byte (8 bits) in two characters. You're using a hex editor, so it's showing you hex, and because of this early byte-orientation, it's showing you two characters for every 8 bits. It's mostly a display thing, though; there's little reason it couldn't show you one character for every 4 bits or four characters for every 16 bits, although file systems generally work on byte granularity for actual data (and much, much larger chunks for storage allocation granularity -- almost always 4k or more).
您在屏幕上看到的这个字符
A
只是一个由 1 和 0 组成的图案。这就是我们如何按照所有标准进行合作,使所有的 1 和 0 使得屏幕上最终出现的模式变得易于理解。字符
A
的值为 65。在二进制中,这是0100 0001
,但在屏幕上,它可能是模式。在 exe 文件中,很多内容存储在不同的文件中。格式、浮点数、整数和字符串。这些格式经常被使用,因为它们很容易被计算机直接读取而无需进一步转换。在十六进制编辑器中,您通常能够读取恰好存储在 exe 文件中的字符串。
在计算机中一切都是二进制的
This character
A
you see here on the screen is just a pattern made of ones and zeros. It's how we all cooperate by all the standards that make all ones and zeros making patterns ending up on the screen understandable.The character
A
can have the value 65. In binary this is0100 0001
but on the screen it might be the patternIn a exe file a lot of stuff is stored in various formats, floats, integers and strings. These formats are often used as they will easily be read directly by the computer without further conversion. In a Hex editor you will often be able to read strings that happen to be stored in the exe file.
In a computer everything's binary
只有两种可能的状态。你所看到的是它们组合的更大模式,就像句子的唯一组成部分是字母和标点符号一样。
There are only two possible states. What you're seeing is larger patterns of combinations of them, much in the same way that the only things sentences are made of are letters and punctuation.
文件中的每个字符(字节)代表 8 位(8 个 1 或 0)。您看不到位,而是看到字节(以及更大的类型)。
Each character (byte) in the file represents 8 bits (8 ones or zeroes). You don't see bits, you see bytes (and larger types).
所以我在这里做一个通俗的回答。上面其他人的建议是正确的,您可以通过十六进制表示来读取二进制。无论如何,大多数数据都以字节数保存。例如,压缩算法可能会计算一些奇数位的压缩表示,但它仍然会将其填充到完整字节以保存它。每个字节可以表示为 8 位或 2 个十六进制数字。
但是,这可能不是您所要求的。您很可能在里面发现了一些 ASCII 数据
所谓的二进制数据。为什么?嗯,有时候代码不仅仅是为了运行。有时
编译器包含一些人类可读的数据,如果代码是这样的话,可以帮助调试
要崩溃,您需要访问堆栈跟踪。诸如变量名、行号等之类的东西。
我并不是必须这样做。我的代码中没有错误。这是正确的。
So I am going to give a layman answer here. What others suggested above is correct, you can read binary through Hex representation. Most data is saved in round number of bytes anyway. It is possible that e.g. compression algorithm computes a compressed representation in some odd number of bits, but it would still pad it to a full byte to save it. And each byte can be represented as 8 bits or 2 hex digits.
But, this may not be what you have asked. Quite likely you found some ascii data inside
the supposedly binary data. Why? Well, sometimes code is not just for running. Sometimes
compilers include some bits of human readable data that can help debugging if the code were
to crash and you needed to access the stack trace. Things like variable names, line numbers etc.
Not that I ever had to do that. I don't have bugs in my code. Thats right.
不要忘记操作系统和磁盘文件系统。他们只能使用其格式的文件。例如,win32 中的可执行文件必须以 PE 标头开头。操作系统在内存中加载可执行文件并传输控制,在可执行文件中分配api指令等等......低级指令由CPU执行,因为该级指令可能已经是一组字节。
Don't forget that about operating system and disk file sytem. They are may only use files in their formats. For example executable files in win32 must begin with PE header. Operation system loads exutable in memory and transfer control, assort api-instructions in the exutables and so on...The low level instructions executes by CPU, for that level instructions already may be a sets of byte.