有没有办法强制 C 或 C++ 的特定字节序?结构?
我已经看到了一些有关结构字节序的问题和答案,但它们是关于检测系统的字节序,或在两种不同字节序之间转换数据。
然而,我现在想要的是,如果有一种方法来强制给定结构的特定字节序。除了用大量操作位域的宏重写整个过程之外,是否还有一些好的编译器指令或其他简单的解决方案?
通用的解决方案会很好,但我也会对特定的 gcc 解决方案感到满意。
编辑:
感谢所有评论指出为什么强制执行字节顺序不是一个好主意,但就我而言,这正是我所需要的。
大量数据是由特定处理器(永远不会改变,它是具有自定义硬件的嵌入式系统)生成的,并且必须由在未知处理器上运行的程序(我正在开发)读取。对数据进行按字节计算将非常麻烦,因为它由数百种不同类型的结构组成,这些结构又大又深:其中大多数内部都有许多层其他大型结构。
更改嵌入式处理器的软件是不可能的。源代码是可用的,这就是为什么我打算使用该系统中的结构,而不是从头开始并按字节评估所有数据。
这就是为什么我需要告诉编译器应该使用哪种字节序,无论它的效率如何。
它不必是一个真实的字节顺序的变化。即使它只是一个接口,并且物理上所有内容都按照处理器自己的字节顺序处理,这对我来说是完全可以接受的。
I've seen a few questions and answers regarding to the endianness of structs, but they were about detecting the endianness of a system, or converting data between the two different endianness.
What I would like to now, however, if there is a way to enforce specific endianness of a given struct. Are there some good compiler directives or other simple solutions besides rewriting the whole thing out of a lot of macros manipulating on bitfields?
A general solution would be nice, but I would be happy with a specific gcc solution as well.
Edit:
Thank you for all the comments pointing out why it's not a good idea to enforce endianness, but in my case that's exactly what I need.
A large amount of data is generated by a specific processor (which will never ever change, it's an embedded system with a custom hardware), and it has to be read by a program (which I am working on) running on an unknown processor. Byte-wise evaluation of the data would be horribly troublesome because it consists of hundreds of different types of structs, which are huge, and deep: most of them have many layers of other huge structs inside.
Changing the software for the embedded processor is out of the question. The source is available, this is why I intend to use the structs from that system instead of starting from scratch and evaluating all the data byte-wise.
This is why I need to tell the compiler which endianness it should use, it doesn't matter how efficient or not will it be.
It does not have to be a real change in endianness. Even if it's just an interface, and physically everything is handled in the processors own endianness, it's perfectly acceptable to me.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
我通常处理这个问题的方式是这样的:
对于
be_uint32_t
也是如此。然后你可以这样定义你的结构:
要点是编译器几乎肯定会按照你编写字段的顺序排列字段,所以你真正担心的是大端整数。 be_uint16_t 对象是一个知道如何根据需要在大端和机器端之间透明地转换自身的类。像这样:
事实上,如果您使用任何相当好的 C++ 编译器编译该代码片段,您应该会发现它发出一个大端“13”作为常量。
对于这些对象,内存中的表示形式是大尾数法。因此,您可以创建它们的数组,将它们放入结构中,等等。但是当您对它们进行操作时,它们会神奇地转换为机器字节序。这通常是 x86 上的一条指令,因此非常高效。在某些情况下,您必须手动进行转换:
...但对于大多数代码,您可以像使用内置类型一样使用它们。
The way I usually handle this is like so:
Similarly for
be_uint32_t
.Then you can define your struct like this:
The point is that the compiler will almost certainly lay out the fields in the order you write them, so all you are really worried about is big-endian integers. The
be_uint16_t
object is a class that knows how to convert itself transparently between big-endian and machine-endian as required. Like this:In fact, if you compile that snippet with any reasonably good C++ compiler, you should find it emits a big-endian "13" as a constant.
With these objects, the in-memory representation is big-endian. So you can create arrays of them, put them in structures, etc. But when you go to operate on them, they magically cast to machine-endian. This is typically a single instruction on x86, so it is very efficient. There are a few contexts where you have to cast by hand:
...but for most code, you can just use them as if they were built-in types.
虽然有点晚了,但对于当前的 GCC(在 6.2.1 上测试过,它可以工作,在 4.9.2 上测试过,没有实现)终于有一种方法可以声明结构体应该以 X-endian 字节顺序保存。
以下测试程序:
创建一个文件“out.bin”,您可以使用十六进制编辑器(例如hexdump -C out.bin)检查该文件。如果支持 scalar_storage_order 属性,它将按此顺序包含预期的 0xaabbff0000aaabcdefaabbccddee 并且没有漏洞。遗憾的是,这当然是非常特定于编译器的。
A bit late to the party but with current GCC (tested on 6.2.1 where it works and 4.9.2 where it's not implemented) there is finally a way to declare that a struct should be kept in X-endian byte order.
The following test program:
creates a file "out.bin" which you can inspect with a hex editor (e.g. hexdump -C out.bin). If the scalar_storage_order attribute is suppported it will contain the expected 0xaabbff0000aaabcdefaabbccddee in this order and without holes. Sadly this is of course very compiler specific.
尝试使用
#pragma scalar_storage_order big-endian
以大端格式存储#pragma scalar_storage_order little-endian
以小端存储#pragma scalar_storage_order default
将其存储在您的计算机默认字节序中阅读更多内容 此处
Try using
#pragma scalar_storage_order big-endian
to store in big-endian-format#pragma scalar_storage_order little-endian
to store in little-endian#pragma scalar_storage_order default
to store it in your machines default endiannessRead more here
不,我不这么认为。
Endianness 是处理器的属性,指示整数是从左到右还是从右到左表示,它不是编译器的属性。
您能做的最好的事情就是编写独立于任何字节顺序的代码。
No, I dont think so.
Endianness is the attribute of processor that indicates whether integers are represented from left to right or right to left it is not an attribute of the compiler.
The best you can do is write code which is independent of any byte order.
不,没有这样的能力。如果它存在,可能会导致编译器必须生成过多/低效的代码,因此 C++ 不支持它。
处理序列化的常用 C++ 方法(我认为这是您要解决的问题)是让结构体以所需的确切布局保留在内存中,并以在反序列化时保留字节序的方式进行序列化。
No, there's no such capability. If it existed that could cause compilers to have to generate excessive/inefficient code so C++ just doesn't support it.
The usual C++ way to deal with serialization (which I assume is what you're trying to solve) this is to let the struct remain in memory in the exact layout desired and do the serialization in such a way that endianness is preserved upon deserialization.
我不确定是否可以修改以下内容以满足您的目的,但在我工作的地方,我们发现以下内容在许多情况下非常有用。
当字节顺序很重要时,我们使用两种不同的数据结构。其中之一是为了代表它预计如何到达。另一个是我们希望它如何在内存中表示。然后开发转换例程以在两者之间进行切换。
工作流程的运行方式如下...
我们发现这种解耦很有用,因为(但不限于)...
希望这种解耦对您的应用程序也有用。
I am not sure if the following can be modified to suit your purposes, but where I work, we have found the following to be quite useful in many cases.
When endianness is important, we use two different data structures. One is done to represent how it expected to arrive. The other is how we want it to be represented in memory. Conversion routines are then developed to switch between the two.
The workflow operates thusly ...
We find this decoupling useful because (but not limited to) ...
Hopefully this decoupling can be useful to your application too.
一个可能的创新解决方案是使用 C 解释器,例如
Ch
并强制使用字节序编码到大。A possible innovative solution would be to use a C interpreter like
Ch
and force the endian coding to big.Boost 为此提供了endian 缓冲区。
例如:
Boost provides endian buffers for this.
For example:
也许不是直接答案,但仔细阅读这个问题可以希望能解答您的一些疑虑。
Maybe not a direct answer, but having a read through this question can hopefully answer some of your concerns.
您可以将该结构设为一个具有数据成员的 getter 和 setter 的类。 getter 和 setter 的实现方式如下:
有时,当我们从文件中读取结构时,我们会这样做 - 我们将其读入结构并在大端和小端机器上使用它来正确访问数据。
You could make the structure a class with getters and setters for the data members. The getters and setters are implemented with something like:
We do this sometimes when we read a structure in from a file - we read it into a struct and use this on both big-endian and little-endian machines to access the data properly.
有一种称为 XDR 的数据表示形式。看看吧。
http://en.wikipedia.org/wiki/External_Data_Representation
虽然可能有点太多了为您的嵌入式系统。尝试搜索您可以使用的已实现的库(检查许可证限制!)。
XDR 通常用于网络系统,因为它们需要一种以独立字节序的方式移动数据的方法。虽然没有说它不能在网络之外使用。
There is a data representation for this called XDR. Have a look at it.
http://en.wikipedia.org/wiki/External_Data_Representation
Though it might be a little too much for your Embedded System. Try searching for an already implemented library that you can use (check license restrictions!).
XDR is generally used in Network systems, since they need a way to move data in an Endianness independent way. Though nothing says that it cannot be used outside of networks.