字节序、语言、硬件和网络

发布于 2024-11-27 16:00:03 字数 442 浏览 2 评论 0原文

我们知道字节序与计算机存储数据的方式有关。大端计算机体系结构包括 IBM 370、Motorola 68000 和 Sun Sparc。 Little Endian 计算机包括 intel 系列（80486、pentium 等）和 VAX。

由于 JVM，Java 始终是 Big-Endian。由于协议原因，网络应始终为 Big-Endian。

C、C++ 和 C# 取决于它们运行的计算机吗？
由于协议原因，网络应始终为 Big-Endian。如果我们在发送之前不调用 htons 和 htonl 怎么样？如果发送者是 intel 机器上的 C++，则发送的数据将为 Little-endian。对吗？
因此，如果我们知道所有客户端和服务器将使用具有相同体系结构的计算机并且将使用相同的程序语言，那么我们不需要关心字节序（调用 ntohl 和 htonl）。是吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

知足的幸福 2024-12-04 16:00:03

至少对于 C 和 C++ 来说，是的；字节顺序通常取决于机器（但也可能取决于编译器）。对于C#，我不知道。
是的，许多网络协议都是大端字节序。如果您不调用 htonl，那么您将不会在小端机器上创建有效的数据包。
因此，您应该始终调用htonl等（或您使用的任何语言中的等效函数）。因为即使今天有一个同质的环境，几乎可以肯定，在未来，这种情况将会改变。

更具体地说，您应该始终尽可能靠近界面并且在一个地方进行转换。如果您的代码库中散布着字节顺序转换调用，则很难推断您的代码是否正常。

回复收藏 0 原文

剑心龙吟 2024-12-04 16:00:03

计算机之间以二进制形式传输的数据取决于 Endian 顺序。
C、C++ 和 C# 不对 Endianess 提出任何要求或要求。
网络应遵循协议。这些数字在按照协议输入和写出后被转换为内部格式。它们可以是任何内部处理格式。
仅在计算机之间传输二进制数据时担心 Endianess，无论是存储在文件中还是立即传输。
浮点数也有类似的问题。
许多语言不关心字节顺序。

回复收藏 0 原文

爱殇璃 2024-12-04 16:00:03

严格来说，Java 使用与其运行的硬件相同的字节序，但它不会向 JVM 用户显示，因为您无法访问 Java 中的原始内存。

没错，C 语言使用当前运行的处理器使用的布局。
正确的。
不管怎样，最好总是转换为网络字节顺序。迟早你会后悔没有使用 htons（和其他），只是因为暂时这并不重要。成本通常是最低的，所以除非你有充分的理由不这样做，否则就这样做！

回复收藏 0 原文

岁月如刀 2024-12-04 16:00:03

用非常抽象的术语来说，您必须了解字节序且特定于字节序的唯一一次就是当您序列化数据时。这有一个非常精确的含义，实际上 C++ 语言标准在某种程度上涵盖了这一点：

在程序的主要部分中，数据来自某种类型的变量，写作 T x ;。到目前为止还是便携的；你的程序总是做你想做的事，你不需要知道x在内部是如何表示的。您知道 x 的内存从 &x 开始，长度为 sizeof(T) 个字节，但您不知道其他任何信息。如果您确实想知道答案，则必须将 &x 从 T* 转换为 unsigned char*。

虽然一般情况下禁止强制转换指针（称为“类型双关”），但标准明确允许这种特定的强制转换。转换为 char-pointer 是将数据从不透明类型 T 序列化为实际字节流的唯一方法。正是在这个时刻，您必须了解字节顺序（或者更一般地说，表示），因为您必须知道字节流以何种顺序构成 T。

对于整数类型，您可以不强制转换指针，但接口仍然处于从字节流到值的转换：

unsigned char buf[sizeof(unsigned int)];
unsigned int value;

buf[0] = value; buf[1] = value >> 8; buf[2] = value >> 16; /*...*/  // We chose an endianness!
value = buf[0] + (buf[1] << 8) + (buf[2] << 16) + ... ; // ditto

当使用诸如 read 和等操作时，您会发现需要将值转换为字节流，反之亦然write，通常与文件、流或套接字关联。

请注意，对于整数值，我们永远不需要知道程序本身的字节顺序 - 我们只需要知道字节流使用的字节顺序！

In very abstract terms, the one and only time when you must be endian-aware and endian-specific is when you serialize data. This has a very precise meaning which is actually covered by the language standard in C++ to some extent:

Inside the main part of your program, data comes in variables of a certain type, written T x;. So far so portable; your program always does what you want and you don't need to know how x is represented internally. You know that the memory for x starts at &x and is sizeof(T) bytes long, but you don't know anything else. If you did want to find out, you would have to cast &x from T* to unsigned char*.

While casting pointers in general is forbidden (it's called "type punning"), this particular cast is expressly permitted by the standard. Casting to char-pointer is the only way you can serialize your data from an opaque type T into a stream of actual bytes. It is precisely at this moment that you must know about endianness (or more generally, representation), because you must know in which order the byte stream makes up the internal representation of T.

For integral types you can do without casting pointers, but the interface is still at the conversion from byte stream to value:

unsigned char buf[sizeof(unsigned int)];
unsigned int value;

buf[0] = value; buf[1] = value >> 8; buf[2] = value >> 16; /*...*/  // We chose an endianness!
value = buf[0] + (buf[1] << 8) + (buf[2] << 16) + ... ; // ditto

You will find the need to convert values into bytestreams and vice versa when using operations like read and write, usually associated to files, streams or sockets.

Note that for integral values we never need to know about the endianness of the program itself - we only need to know the endianness that is used by the byte stream!

回复收藏 0 原文

~没有更多了~