字节序、语言、硬件和网络
我们知道字节序与计算机存储数据的方式有关。大端计算机体系结构包括 IBM 370、Motorola 68000 和 Sun Sparc。 Little Endian 计算机包括 intel 系列(80486、pentium 等)和 VAX。
由于 JVM,Java 始终是 Big-Endian。 由于协议原因,网络应始终为 Big-Endian。
- C、C++ 和 C# 取决于它们运行的计算机吗?
- 由于协议原因,网络应始终为 Big-Endian。如果我们在发送之前不调用 htons 和 htonl 怎么样?如果发送者是 intel 机器上的 C++,则发送的数据将为 Little-endian。对吗?
- 因此,如果我们知道所有客户端和服务器将使用具有相同体系结构的计算机并且将使用相同的程序语言,那么我们不需要关心字节序(调用 ntohl 和 htonl)。是吗?
We know the endian is related to the way how computers store data. Big endian computer architectures include the IBM 370, the Motorola 68000 and Sun Sparc. Little endian computers include the intel series (80486, pentium etc) and VAX.
Java is always Big-Endian because of the JVM.
Network should always be Big-Endian because of the protocol.
- C, C++ and C# depand on the computer they are running?
- Network should always be Big-Endian because of the protocol. how about if we don't call htons and htonl before we send? The data sent across will be Little-endian if the sender is C++ on an intel machine. Is it right?
- So we don't need to care about the endian (call ntohl and htonl), if we know all the clients and server will use computers with the same architectures and will use the same program language. is it right?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
htonl
,那么您将不会在小端机器上创建有效的数据包。htonl
等(或您使用的任何语言中的等效函数)。因为即使今天有一个同质的环境,几乎可以肯定,在未来,这种情况将会改变。更具体地说,您应该始终尽可能靠近界面并且在一个地方进行转换。如果您的代码库中散布着字节顺序转换调用,则很难推断您的代码是否正常。
htonl
, then you will not be creating a valid packet on a little-endian machine.htonl
, etc. (or the equivalent in whichever language you're using). Because if even if you have a homogeneous environment today, it's almost certain that in the future, this will change.More specifically, you should always do the conversion as close to the interface as you can, and in one place. If you have endianness conversion calls strewn across your codebase, it becomes difficult to reason about whether your code is sane or not.
计算机之间以二进制形式传输的数据取决于 Endian 顺序。
C、C++ 和 C# 不对 Endianess 提出任何要求或要求。
网络应遵循协议。这些数字在按照协议输入和写出后被转换为内部格式。它们可以是任何内部处理格式。
仅在计算机之间传输二进制数据时担心 Endianess,无论是存储在文件中还是立即传输。
浮点数也有类似的问题。
许多语言不关心字节顺序。
Data transfered between computers in binary depends on Endian ordering.
C, C++ and C# do not make any demands or requirements on Endianess.
Network should follow protocol. The numbers are converted to internal format after they are input and written out per protocol. They can be any format for internal processing.
Only worry about Endianess when transferring binary data between computers, whether stored in files or immediately transferred.
Floating point numbers suffer from similar problems.
Many languages do not care about Endianness.
严格来说,Java 使用与其运行的硬件相同的字节序,但它不会向 JVM 用户显示,因为您无法访问 Java 中的原始内存。
Strictly Java uses the same endian as the hardware it is running on, but it does not show for the JVM user as you cannot access raw memory in Java.
用非常抽象的术语来说,您必须了解字节序且特定于字节序的唯一一次就是当您序列化数据时。这有一个非常精确的含义,实际上 C++ 语言标准在某种程度上涵盖了这一点:
在程序的主要部分中,数据来自某种类型的变量,写作
T x ;
。到目前为止还是便携的;你的程序总是做你想做的事,你不需要知道x
在内部是如何表示的。您知道x
的内存从&x
开始,长度为sizeof(T)
个字节,但您不知道其他任何信息。如果您确实想知道答案,则必须将&x
从T*
转换为unsigned char*
。虽然一般情况下禁止强制转换指针(称为“类型双关”),但标准明确允许这种特定的强制转换。转换为 char-pointer 是将数据从不透明类型
T
序列化为实际字节流的唯一方法。正是在这个时刻,您必须了解字节顺序(或者更一般地说,表示),因为您必须知道字节流以何种顺序构成T
。对于整数类型,您可以不强制转换指针,但接口仍然处于从字节流到值的转换:
当使用诸如
read
和 等操作时,您会发现需要将值转换为字节流,反之亦然write
,通常与文件、流或套接字关联。请注意,对于整数值,我们永远不需要知道程序本身的字节顺序 - 我们只需要知道字节流使用的字节顺序!
In very abstract terms, the one and only time when you must be endian-aware and endian-specific is when you serialize data. This has a very precise meaning which is actually covered by the language standard in C++ to some extent:
Inside the main part of your program, data comes in variables of a certain type, written
T x;
. So far so portable; your program always does what you want and you don't need to know howx
is represented internally. You know that the memory forx
starts at&x
and issizeof(T)
bytes long, but you don't know anything else. If you did want to find out, you would have to cast&x
fromT*
tounsigned char*
.While casting pointers in general is forbidden (it's called "type punning"), this particular cast is expressly permitted by the standard. Casting to char-pointer is the only way you can serialize your data from an opaque type
T
into a stream of actual bytes. It is precisely at this moment that you must know about endianness (or more generally, representation), because you must know in which order the byte stream makes up the internal representation ofT
.For integral types you can do without casting pointers, but the interface is still at the conversion from byte stream to value:
You will find the need to convert values into bytestreams and vice versa when using operations like
read
andwrite
, usually associated to files, streams or sockets.Note that for integral values we never need to know about the endianness of the program itself - we only need to know the endianness that is used by the byte stream!