为什么位字节顺序是位字段中的一个问题?
任何使用位域的可移植代码似乎都会区分小端和大端平台。有关此类代码的示例,请参阅 linux 内核中 struct iphdr 的声明 。我根本不明白为什么位字节顺序是一个问题。
据我了解,位域纯粹是编译器构造,用于促进位级操作。
例如,考虑以下位字段: <代码>
struct ParsedInt {
unsigned int f1:1;
unsigned int f2:3;
unsigned int f3:4;
};
uint8_t i;
struct ParsedInt *d = &i;
Here, writing d->f2
is simply a compact and readable way of saying (i>>1) & (1<<4 - 1)
.然而,位操作是明确定义的并且无论架构如何都可以工作。那么,为什么位域不可移植呢?
Any portable code that uses bitfields seems to distinguish between little- and big-endian platforms. See the declaration of struct iphdr in linux kernel for an example of such code. I fail to understand why bit endianness is an issue at all.
As far as I understand, bitfields are purely compiler constructs, used to facilitate bit level manipulations.
For instance, consider the following bitfield:
struct ParsedInt { unsigned int f1:1; unsigned int f2:3; unsigned int f3:4; }; uint8_t i; struct ParsedInt *d = &i;
Here, writing d->f2
is simply a compact and readable way of saying (i>>1) & (1<<4 - 1)
.
However, bit operations are well-defined and work regardless of the architecture. So, how come bitfields are not portable?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
根据 C 标准,编译器几乎可以以任何它想要的随机方式自由存储位字段。您永远不能对位的分配位置做出任何假设。以下是 C 标准未指定的一些与位字段相关的内容:
未指定的行为
实现定义的行为
大/小尾数当然也是实现定义的。这意味着您的结构可以通过以下方式分配(假设 16 位整数):
哪一种适用?猜测一下,或者阅读编译器的深入后端文档。添加 32 位整数(大端或小端)的复杂性。然后添加一个事实,即允许编译器在位字段内的任何位置添加任意数量的填充字节,因为它被视为结构体(它不能在结构体的最开始添加填充,但在其他地方)。
然后我什至没有提到如果您使用普通的“int”作为位字段类型=实现定义的行为,或者如果您使用除(无符号)int =实现定义的行为之外的任何其他类型,会发生什么。
因此,要回答这个问题,不存在可移植位域代码之类的东西,因为 C 标准对于如何实现位域非常模糊。位字段唯一值得信赖的是布尔值块,程序员不关心位在内存中的位置。
唯一可移植的解决方案是使用按位运算符而不是位字段。生成的机器代码将完全相同,但具有确定性。位运算符在任何系统的任何 C 编译器上都是 100% 可移植的。
By the C standard, the compiler is free to store the bit field pretty much in any random way it wants. You can never make any assumptions of where the bits are allocated. Here are just a few bit-field related things that are not specified by the C standard:
Unspecified behavior
Implementation-defined behavior
Big/little endian is of course also implementation-defined. This means that your struct could be allocated in the following ways (assuming 16 bit ints):
Which one applies? Take a guess, or read in-depth backend documentation of your compiler. Add the complexity of 32-bit integers, in big- or little endian, to this. Then add the fact that the compiler is allowed to add any number of padding bytes anywhere inside your bit field, because it is treated as a struct (it can't add padding at the very beginning of the struct, but everywhere else).
And then I haven't even mentioned what happens if you use plain "int" as bit-field type = implementation-defined behavior, or if you use any other type than (unsigned) int = implementation-defined behavior.
So to answer the question, there is no such thing as portable bit-field code, because the C standard is extremely vague with how bit fields should be implemented. The only thing bit-fields can be trusted with is to be chunks of boolean values, where the programmer isn't concerned of the location of the bits in memory.
The only portable solution is to use the bit-wise operators instead of bit fields. The generated machine code will be exactly the same, but deterministic. Bit-wise operators are 100% portable on any C compiler for any system.
,这就是问题的一部分。如果位字段的使用仅限于编译器“拥有”的内容,那么编译器如何打包位或对它们进行排序几乎任何人都不会关心。
然而,位字段可能更常用于对编译器域外部的结构进行建模 - 硬件寄存器、通信“线路”协议或文件格式布局。这些东西对如何布局位有严格的要求,并且使用位字段对它们进行建模意味着您必须依赖实现定义的,甚至更糟糕的是编译器如何布局位字段的未指定行为。
简而言之,位字段的指定不够好,无法使其在最常用的情况下发挥作用。
And that's part of the problem. If the use of bit-fields was restricted to what the compiler 'owned', then how the compiler packed bits or ordered them would be of pretty much no concern to anyone.
However, bit-fields are probably used far more often to model constructs that are external to the compiler's domain - hardware registers, the 'wire' protocol for communications, or file format layout. These thing have strict requirements of how bits have to be laid out, and using bit-fields to model them means that you have to rely on implementation-defined and - even worse - the unspecified behavior of how the compiler will layout the bit-field.
In short, bit-fields are not specified well enough to make them useful for the situations they seem to be most commonly used for.
ISO/IEC 9899:6.7.2.1 / 10
在尝试编写可移植代码时,无论系统字节序或位数如何,使用位移操作而不是对位字段排序或对齐进行任何假设会更安全。
另请参阅EXP11-C。不要将期望一种类型的运算符应用于不兼容类型的数据。
ISO/IEC 9899: 6.7.2.1 / 10
It is safer to use bit shift operations instead of making any assumptions on bit field ordering or alignment when trying to write portable code, regardless of system endianness or bitness.
Also see EXP11-C. Do not apply operators expecting one type to data of an incompatible type.
位域访问是根据底层类型的操作来实现的。在示例中,为
unsigned int
。因此,如果您有类似的情况:当您访问字段
b
时,编译器会访问整个unsigned int
,然后移位并屏蔽适当的位范围。 (嗯,它不一定必须,但我们可以假装它确实如此。)在大端上,布局将是这样的(最重要的位在前):
在小端上,布局将是像这样:
如果您想从小端访问大端布局,反之亦然,则必须做一些额外的工作。可移植性的增加会带来性能损失,并且由于结构布局已经是不可移植的,因此语言实现者选择了更快的版本。
这做出了很多假设。另请注意,大多数平台上的
sizeof(struct x) == 4
。Bit field accesses are implemented in terms of operations on the underlying type. In the example,
unsigned int
. So if you have something like:When you access field
b
, the compiler accesses an entireunsigned int
and then shifts and masks the appropriate bit range. (Well, it doesn't have to, but we can pretend that it does.)On big endian, layout will be something like this (most significant bit first):
On little endian, layout will be like this:
If you want to access the big endian layout from little endian or vice versa, you'll have to do some extra work. This increase in portability has a performance penalty, and since struct layout is already non-portable, language implementors went with the faster version.
This makes a lot of assumptions. Also note that
sizeof(struct x) == 4
on most platforms.位字段将以不同的顺序存储,具体取决于机器的字节顺序,这在某些情况下可能并不重要,但在其他情况下可能很重要。举例来说,您的 ParsedInt 结构表示通过网络发送的数据包中的标志,小端机器和大端机器以与传输字节不同的顺序读取这些标志,这显然是一个问题。
The bit fields will be stored in a different order depending on the endian-ness of the machine, this may not matter in some cases but in other it may matter. Say for example that your ParsedInt struct represented flags in a packet sent over a network, a little endian machine and big endian machine read those flags in a different order from the transmitted byte which is obviously a problem.
为了回应最突出的一点:如果您在单个编译器/硬件平台上将其用作仅软件构造,那么字节序将不会成为问题。如果您跨多个平台使用代码或数据,或者需要匹配硬件位布局,那么这就是一个问题。而且很多专业软件都是跨平台的,因此必须要小心。
这是最简单的示例:我有将二进制格式的数字存储到磁盘的代码。如果我不自己显式地逐字节写入和读取这些数据到磁盘,那么如果从相反的字节序系统读取,它不会是相同的值。
具体示例:
假设我的程序在磁盘上附带了一些我想要读入的数据。假设在本例中我想将其加载为 4096...
这里我将其读取为 16 位值,而不是显式字节。
这意味着如果我的系统与存储在磁盘上的字节顺序匹配,我会得到 4096,如果不匹配,我会得到 16!!!!
所以字节顺序最常见的用途是批量加载二进制数,如果不匹配则进行bswap。过去,我们将数据以大端存储在磁盘上,因为英特尔是个奇怪的人,提供了高速指令来交换字节。如今,Intel 如此普遍,以至于经常将 Little Endian 设为默认值,并在大 Endian 系统上进行交换。
一种较慢但字节序中立的方法是按字节执行所有 I/O,即:
请注意,这与您编写的执行字节序交换的代码相同,但您不再需要检查字节序。您可以使用宏来减轻这种痛苦。
我使用了程序使用的存储数据的示例。
提到的另一个主要应用是写入硬件寄存器,其中这些寄存器具有绝对顺序。一个非常常见的地方就是图形。如果字节顺序错误,红色和蓝色通道就会颠倒!同样,问题之一是可移植性 - 您可以简单地适应给定的硬件平台和显卡,但如果您希望相同的代码在不同的机器上运行,则必须进行测试。
这是一个经典的测试:
请注意,位域问题也存在,但与字节顺序问题正交。
To echo the most salient points: If you are using this on a single compiler/HW platform as a software only construct, then endianness will not be an issue. If you are using code or data across multiple platforms OR need to match hardware bit layouts, then it IS an issue. And a lot of professional software is cross-platform, hence it has to care.
Here's the simplest example: I have code that stores numbers in binary format to disk. If I do not write and read this data to disk myself explicitly byte by byte, then it will not be the same value if read from an opposite endian system.
Concrete example:
Let's say my program ships with some data on the disk that I want to read in. Say I want to load it as 4096 in this case...
Here I read it as a 16-bit value, not as explicit bytes.
That means if my system matches the endianness stored on disk, I get 4096, and if it doesn't, I get 16 !!!!!
So the most common use of endianness is to bulk load binary numbers, and then do a bswap if you don't match. In the past, we'd store data on disk as big endian because Intel was the odd man out and provided high speed instructions to swap the bytes. Nowadays, Intel is so common that often make Little Endian the default and swap when on a big endian system.
A slower, but endian neutral approach is to do ALL I/O by bytes, i.e.:
Note that this is identical to the code you'd write to do an endian swap, but you no longer need to check the endianness. And you can use macros to make this less painful.
I used the example of stored data used by a program.
The other main application mentioned is to write hardware registers, where those registers have an absolute ordering. One VERY COMMON place this comes up is with graphics. Get the endianness wrong and your red and blue color channels get reversed! Again, the issue is one of portability - you could simply adapt to a given hardware platform and graphics card, but if you want your same code to work on different machines, you must test.
Here's a classic test:
Note that bitfield issues exist as well but are orthogonal to endianness issues.
当您需要将结构与位字段以及您无法控制的实体进行通信时,字节顺序非常重要;例如网络通信或者您需要实现 OSI 层的某些部分...然后您需要遵循一些商定的协议,其中传输位的顺序(传输顺序)以及它们的含义。
从这个意义上说,我不理解上面关于位字段布局未标准化的所有麻烦,因此,您不应该使用它们;我试图在另一个相关问题中回答这个问题,并给出了一个关于如何使用和断言位字段的示例。滚动你自己的位标志很容易出错,并且会使代码更加“模糊”或“分散你的语义”(因为缺乏更好的术语)。 您可以在此处找到示例。
The endianess is important when you need to communicate the structure with the bit-fields, with an entity over which you don't have control; e.g. network communication or you need to implement some part of an OSI layer... Then you need to follow some agreed-upon protocol in which order (transmission order) the bits are transported and what they mean.
In that sense, I don't understand all the fussiness above about bit-field layout not being standardized and therefore, you should not use them; I tried to answer this in another related question and gave an example on how I use and assert bit-fields. Rolling your own bit flags is error prone and makes to code more 'fuzzy' or 'distracting away from your semantics' (for lack of a better term). You can find the example here.
只是要指出 - 我们一直在讨论字节字节顺序的问题,而不是位字节顺序或位域中的字节顺序,这涉及到另一个问题:
如果您正在编写跨平台代码,切勿只将结构写为二进制对象。除了上述字节序问题之外,编译器之间还可能存在各种打包和格式化问题。这些语言对编译器如何在实际内存中布局结构或位域没有提供任何限制,因此在保存到磁盘时,必须一次写入一个结构的每个数据成员,最好以字节中立的方式写入。
这种打包会影响位字段中的“位字节顺序”,因为不同的编译器可能以不同的方向存储位字段,并且位字节顺序会影响它们的提取方式。
因此请记住问题的两个层面 - 字节字节顺序影响计算机读取单个标量值(例如浮点数)的能力,而编译器(和构建参数)影响程序读取聚合结构的能力。
我过去所做的是以中立的方式保存和加载文件,并存储有关数据在内存中布局方式的元数据。这允许我在兼容的情况下使用“快速且简单”的二进制加载路径。
Just to point out - we've been discussing the issue of byte endianness, not bit endianness or endianness in bitfields, which crosses into the other issue:
If you are writing cross platform code, never just write out a struct as a binary object. Besides the endian byte issues described above, there can be all kinds of packing and formatting issues between compilers. The languages provide no restrictions on how a compiler may lay out structs or bitfields in actual memory, so when saving to disk, you must write each data member of a struct one at a time, preferably in a byte neutral way.
This packing impacts "bit endianness" in bitfields because different compilers might store the bitfields in a different direction, and the bit endianness impacts how they'd be extracted.
So bear in mind BOTH levels of the problem - the byte endianness impacts a computer's ability to read a single scalar value, e.g., a float, while the compiler (and build arguments) impact a program's ability to read in an aggregate structure.
What I have done in the past is to save and load a file in a neutral way and store meta-data about the way the data is laid out in memory. This allows me to use the "fast and easy" binary load path where compatible.