Big Endian 和 Little Endian 字节顺序的区别
Big Endian 和 Little Endian 字节顺序有什么区别?
这两个似乎都与 Unicode 和 UTF16 有关。 我们到底在哪里使用它?
What is the difference between Big Endian and Little Endian Byte order ?
Both of these seem to be related to Unicode and UTF16. Where exactly do we use this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
Big-Endian (BE) / Little-Endian (LE) 是组织多字节字的两种方法。 例如,当使用两个字节来表示 UTF-16 中的字符时,有两种方法可以将字符
0x1234
表示为字节串(0x00-0xFF):使用 UTF-16BE 或 UTF-16LE,规范建议在字符串前面添加一个字节顺序标记 (BOM),表示字符 U+FEFF。 因此,如果 UTF-16 编码的文本文件的前两个字节是
FE
、FF
,则编码为 UTF-16BE。 对于FF
、FE
,它是UTF-16LE。直观示例:不同编码中的“Example”一词(带 BOM 的 UTF-16):
有关更多信息,请阅读 Wikipedia 页面 字节序 和/或 UTF-16。
Big-Endian (BE) / Little-Endian (LE) are two ways to organize multi-byte words. For example, when using two bytes to represent a character in UTF-16, there are two ways to represent the character
0x1234
as a string of bytes (0x00-0xFF):In order to decide if a text uses UTF-16BE or UTF-16LE, the specification recommends to prepend a Byte Order Mark (BOM) to the string, representing the character U+FEFF. So, if the first two bytes of a UTF-16 encoded text file are
FE
,FF
, the encoding is UTF-16BE. ForFF
,FE
, it is UTF-16LE.A visual example: The word "Example" in different encodings (UTF-16 with BOM):
For further information, please read the Wikipedia page of Endianness and/or UTF-16.
费迪南德的答案(和其他人)是正确的,但不完整。
Big Endian (BE) / Little Endian (LE) 与 UTF-16 或 UTF-32 无关。
它们早在 Unicode 之前就已存在,并影响数字字节在计算机内存中的存储方式。 它们取决于处理器。
如果您有一个值为
0x12345678
的数字,那么在内存中它将表示为12 34 56 78
(BE) 或78 56 34 12
(LE)。UTF-16 和 UTF-32 恰好分别用 2 个和 4 个字节表示,因此字节的顺序遵循该平台上任何数字遵循的顺序。
Ferdinand's answer (and others) are correct, but incomplete.
Big Endian (BE) / Little Endian (LE) have nothing to do with UTF-16 or UTF-32.
They existed way before Unicode, and affect how the bytes of numbers get stored in the computer's memory. They depend on the processor.
If you have a number with the value
0x12345678
then in memory it will be represented as12 34 56 78
(BE) or78 56 34 12
(LE).UTF-16 and UTF-32 happen to be represented on 2 respectively 4 bytes, so the order of the bytes respects the ordering that any number follows on that platform.
UTF-16 将 Unicode 编码为 16 位值。 大多数现代文件系统都在 8 位字节上运行。 因此,例如,要将 UTF-16 编码的文件保存到磁盘,您必须决定 16 位值的哪一部分进入第一个字节,哪一部分进入第二个字节。
维基百科有更完整的解释。
UTF-16 encodes Unicode into 16-bit values. Most modern filesystems operate on 8-bit bytes. So, to save a UTF-16 encoded file to disk, for example, you have to decide which part of the 16-bit value goes in the first byte, and which goes into the second byte.
Wikipedia has a more complete explanation.
little-endian:形容词。
描述一种计算机体系结构,其中在给定的 16 位或 32 位字中,较低地址的字节具有较低的重要性(该字以“小端优先”的方式存储) )。 PDP-11 和 VAX 系列计算机和 Intel 微处理器以及许多通信和网络硬件都是小端字节序。 该术语有时用于描述字节以外的单位的排序; 最常见的是一个字节内的位。
big-endian:形容词。
[常见; 摘自《斯威夫特的格列佛游记》,通过丹尼·科恩 (Danny Cohen) 撰写的著名论文《论圣战与和平诉求》,USC/ISI IEN 137,日期为 1980 年 4 月 1 日]
描述了一种计算机体系结构,其中在给定的多字节数字表示中,最有效字节具有最低地址(该字存储为“大端优先”)。 大多数处理器,包括 IBM 370 系列、PDP-10、Motorola 微处理器系列以及大多数各种 RISC 设计都是大端字节序。 大端字节顺序有时也称为网络顺序。
---来自行话文件:http://catb.org/~esr/jargon /html/index.html
little-endian: adj.
Describes a computer architecture in which, within a given 16- or 32-bit word, bytes at lower addresses have lower significance (the word is stored ‘little-end-first’). The PDP-11 and VAX families of computers and Intel microprocessors and a lot of communications and networking hardware are little-endian. The term is sometimes used to describe the ordering of units other than bytes; most often, bits within a byte.
big-endian: adj.
[common; From Swift's Gulliver's Travels via the famous paper On Holy Wars and a Plea for Peace by Danny Cohen, USC/ISI IEN 137, dated April 1, 1980]
Describes a computer architecture in which, within a given multi-byte numeric representation, the most significant byte has the lowest address (the word is stored ‘big-end-first’). Most processors, including the IBM 370 family, the PDP-10, the Motorola microprocessor families, and most of the various RISC designs are big-endian. Big-endian byte order is also sometimes called network order.
---from the Jargon File: http://catb.org/~esr/jargon/html/index.html
字节尾数(大或小)需要为 Unicode/UTF-16 编码指定,因为使用多个字节的字符代码,可以选择是否读/写最高有效字节 第一个或最后一个。 Unicode/UTF-16,因为它们是可变长度编码(即每个字符可以由一个或多个字节表示),所以需要指定这一点。 (但请注意,UTF-8“单词”的长度始终为 8 位/一个字节[尽管字符可以是多个点],因此字节顺序不存在问题。)如果表示 Unicode 文本的字节流的编码器和解码器未就所使用的约定达成一致,可能会解释错误的字符代码。 因此,要么预先知道字节序约定,要么更常见的是 字节顺序标记通常在任何 Unicode 文本文件/流的开头指定,以指示是否使用大端或小端顺序。
Byte endianness (big or little) needs to be specified for Unicode/UTF-16 encoding because for character codes that use more than a single byte, there is a choice of whether to read/write the most significant byte first or last. Unicode/UTF-16, since they are variable-length encodings (i.e. each char can be represented by one or several bytes) require this to be specified. (Note however that UTF-8 "words" are always 8-bits/one byte in length [though characters can be multiple points], therefore there is no problem with endianness.) If the encoder of a stream of bytes representing Unicode text and the decoder aren't agreed on which convention is being used, the wrong character code can be interpreted. For this reason, either the convention of endianness is known beforehand or more commonly a byte order mark is usually specified at the beginning of any Unicode text file/stream to indicate whethere big or little endian order is being used.
记住哪个是哪个的一个巧妙方法是查看单词 BIG ENDian 和 LITTLE ENDian。
Big Endian 将 BIGGEST END 存储在开头。 就像它的拼写一样 -Big ENDian
Little Endian 将 LITTLEST END 存储在开头。 就像它的拼写一样 -Little ENDian
我所说的大和小是指重要性。 大是最有意义的。 很少是最不重要的。
示例:以十六进制存储值“258”在大尾数中看起来像 0102,在小尾数中看起来像 0201。 加 1 就变成 259(大:0103,小:0301)。
最不重要的数字是变化最快的。 最重要的是需要付出很大的努力才能改变。 就像 1,000,000 一样:所有的 0 都会先于 1 改变。 百万分之一是该示例中最高有效的数字。 零的重要性较低,因为更改它们所需的时间较少。
类比时间:
美国书写日期的方式就像 Big Endian(02/22 = feb 22,我们将较大的有效数字[月份]放在前面),
而另一种书写日期的方式就像 Little Endian(22/02) = 2 月 22 日,他们把最不重要的数字(日期)放在前面,就像写一百万,比如 000,000,1)
意见:
书写日期的最佳方式是 YYYY/MM/DD-HH:MM:SS(使用 24 小时制)。 不会造成混乱,它非常适合按年龄排序,因为年份是这里最重要的数字,然后是月份,然后是日期,然后是小时,然后是分钟,最后是秒。 这将是大端字节序。
A neat way to remember which is which is to look at the words BIG ENDian and LITTLE ENDian.
Big Endian stores the BIGGEST END at the beginning. Just like it's spelt -Big ENDian
Little Endian stores the LITTLEST END at the beginning. Just like it's spelt -Little ENDian
By big and little I mean the significance. Big is the most significant. Little is the least significant.
Example: to store the value "258" in hex would look like 0102 in big endian, and 0201 in little endian. Add 1 to it and it becomes 259 (Big: 0103 , Little: 0301).
Least significant number is the quickest to change. Most significant would take a lot to change. Like 1,000,000: all the zero's would change before the one does. The millionth is the most significant digit in that example. The zero's are less significant because it takes less to change those.
Analogy time:
The american way to write dates is like Big endian (02/22 = feb 22 where we put the larger significant number [the month] first)
and the -other- way to write dates is like Little Endian (22/02 = 22 feb where they put the least significant number [the day] first. like writing a million like 000,000,1)
OPINION:
The best way to write dates would be YYYY/MM/DD-HH:MM:SS (using the 24-hour time). There's no confusion, it's perfect for sorting by age because year is the most significant number here, then month, then day, then hour, then minute, finally second. This would be BIG ENDIAN.