有没有办法强制 C 或 C++ 的特定字节序?结构?

发布于 2024-11-24 05:28:55 字数 628 浏览 2 评论 0原文

我已经看到了一些有关结构字节序的问题和答案,但它们是关于检测系统的字节序,或在两种不同字节序之间转换数据。

然而,我现在想要的是,如果有一种方法来强制给定结构的特定字节序。除了用大量操作位域的宏重写整个过程之外,是否还有一些好的编译器指令或其他简单的解决方案?

通用的解决方案会很好,但我也会对特定的 gcc 解决方案感到满意。

编辑:

感谢所有评论指出为什么强制执行字节顺序不是一个好主意,但就我而言,这正是我所需要的。

大量数据是由特定处理器(永远不会改变,它是具有自定义硬件的嵌入式系统)生成的,并且必须由在未知处理器上运行的程序(我正在开发)读取。对数据进行按字节计算将非常麻烦,因为它由数百种不同类型的结构组成,这些结构又大又深:其中大多数内部都有许多层其他大型结构。

更改嵌入式处理器的软件是不可能的。源代码是可用的,这就是为什么我打算使用该系统中的结构,而不是从头开始并按字节评估所有数据。

这就是为什么我需要告诉编译器应该使用哪种字节序,无论它的效率如何。

它不必是一个真实的字节顺序的变化。即使它只是一个接口,并且物理上所有内容都按照处理器自己的字节顺序处理,这对我来说是完全可以接受的。

I've seen a few questions and answers regarding to the endianness of structs, but they were about detecting the endianness of a system, or converting data between the two different endianness.

What I would like to now, however, if there is a way to enforce specific endianness of a given struct. Are there some good compiler directives or other simple solutions besides rewriting the whole thing out of a lot of macros manipulating on bitfields?

A general solution would be nice, but I would be happy with a specific gcc solution as well.

Edit:

Thank you for all the comments pointing out why it's not a good idea to enforce endianness, but in my case that's exactly what I need.

A large amount of data is generated by a specific processor (which will never ever change, it's an embedded system with a custom hardware), and it has to be read by a program (which I am working on) running on an unknown processor. Byte-wise evaluation of the data would be horribly troublesome because it consists of hundreds of different types of structs, which are huge, and deep: most of them have many layers of other huge structs inside.

Changing the software for the embedded processor is out of the question. The source is available, this is why I intend to use the structs from that system instead of starting from scratch and evaluating all the data byte-wise.

This is why I need to tell the compiler which endianness it should use, it doesn't matter how efficient or not will it be.

It does not have to be a real change in endianness. Even if it's just an interface, and physically everything is handled in the processors own endianness, it's perfectly acceptable to me.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

小矜持 2024-12-01 05:28:55

我通常处理这个问题的方式是这样的:

#include <arpa/inet.h> // for ntohs() etc.
#include <stdint.h>

class be_uint16_t {
public:
        be_uint16_t() : be_val_(0) {
        }
        // Transparently cast from uint16_t
        be_uint16_t(const uint16_t &val) : be_val_(htons(val)) {
        }
        // Transparently cast to uint16_t
        operator uint16_t() const {
                return ntohs(be_val_);
        }
private:
        uint16_t be_val_;
} __attribute__((packed));

对于 be_uint32_t 也是如此。

然后你可以这样定义你的结构:

struct be_fixed64_t {
    be_uint32_t int_part;
    be_uint32_t frac_part;
} __attribute__((packed));

要点是编译器几乎肯定会按照你编写字段的顺序排列字段,所以你真正担心的是大端整数。 be_uint16_t 对象是一个知道如何根据需要在大端和机器端之间透明地转换自身的类。像这样:

be_uint16_t x = 12;
x = x + 1; // Yes, this actually works
write(fd, &x, sizeof(x)); // writes 13 to file in big-endian form

事实上,如果您使用任何相当好的 C++ 编译器编译该代码片段,您应该会发现它发出一个大端“13”作为常量。

对于这些对象,内存中的表示形式是大尾数法。因此,您可以创建它们的数组,将它们放入结构中,等等。但是当您对它们进行操作时,它们会神奇地转换为机器字节序。这通常是 x86 上的一条指令,因此非常高效。在某些情况下,您必须手动进行转换:

be_uint16_t x = 37;
printf("x == %u\n", (unsigned)x); // Fails to compile without the cast

...但对于大多数代码,您可以像使用内置类型一样使用它们。

The way I usually handle this is like so:

#include <arpa/inet.h> // for ntohs() etc.
#include <stdint.h>

class be_uint16_t {
public:
        be_uint16_t() : be_val_(0) {
        }
        // Transparently cast from uint16_t
        be_uint16_t(const uint16_t &val) : be_val_(htons(val)) {
        }
        // Transparently cast to uint16_t
        operator uint16_t() const {
                return ntohs(be_val_);
        }
private:
        uint16_t be_val_;
} __attribute__((packed));

Similarly for be_uint32_t.

Then you can define your struct like this:

struct be_fixed64_t {
    be_uint32_t int_part;
    be_uint32_t frac_part;
} __attribute__((packed));

The point is that the compiler will almost certainly lay out the fields in the order you write them, so all you are really worried about is big-endian integers. The be_uint16_t object is a class that knows how to convert itself transparently between big-endian and machine-endian as required. Like this:

be_uint16_t x = 12;
x = x + 1; // Yes, this actually works
write(fd, &x, sizeof(x)); // writes 13 to file in big-endian form

In fact, if you compile that snippet with any reasonably good C++ compiler, you should find it emits a big-endian "13" as a constant.

With these objects, the in-memory representation is big-endian. So you can create arrays of them, put them in structures, etc. But when you go to operate on them, they magically cast to machine-endian. This is typically a single instruction on x86, so it is very efficient. There are a few contexts where you have to cast by hand:

be_uint16_t x = 37;
printf("x == %u\n", (unsigned)x); // Fails to compile without the cast

...but for most code, you can just use them as if they were built-in types.

相权↑美人 2024-12-01 05:28:55

虽然有点晚了,但对于当前的 GCC(在 6.2.1 上测试过,它可以工作,在 4.9.2 上测试过,没有实现)终于有一种方法可以声明结构体应该以 X-endian 字节顺序保存。

以下测试程序:

#include <stdio.h>
#include <stdint.h>

struct __attribute__((packed, scalar_storage_order("big-endian"))) mystruct {
    uint16_t a;
    uint32_t b;
    uint64_t c;
};


int main(int argc, char** argv) {
    struct mystruct bar = {.a = 0xaabb, .b = 0xff0000aa, .c = 0xabcdefaabbccddee};

    FILE *f = fopen("out.bin", "wb");
    size_t written = fwrite(&bar, sizeof(struct mystruct), 1, f);
    fclose(f);
}

创建一个文件“out.bin”,您可以使用十六进制编辑器(例如hexdump -C out.bin)检查该文件。如果支持 scalar_storage_order 属性,它将按此顺序包含预期的 0xaabbff0000aaabcdefaabbccddee 并且没有漏洞。遗憾的是,这当然是非常特定于编译器的。

A bit late to the party but with current GCC (tested on 6.2.1 where it works and 4.9.2 where it's not implemented) there is finally a way to declare that a struct should be kept in X-endian byte order.

The following test program:

#include <stdio.h>
#include <stdint.h>

struct __attribute__((packed, scalar_storage_order("big-endian"))) mystruct {
    uint16_t a;
    uint32_t b;
    uint64_t c;
};


int main(int argc, char** argv) {
    struct mystruct bar = {.a = 0xaabb, .b = 0xff0000aa, .c = 0xabcdefaabbccddee};

    FILE *f = fopen("out.bin", "wb");
    size_t written = fwrite(&bar, sizeof(struct mystruct), 1, f);
    fclose(f);
}

creates a file "out.bin" which you can inspect with a hex editor (e.g. hexdump -C out.bin). If the scalar_storage_order attribute is suppported it will contain the expected 0xaabbff0000aaabcdefaabbccddee in this order and without holes. Sadly this is of course very compiler specific.

浮生面具三千个 2024-12-01 05:28:55

尝试使用
#pragma scalar_storage_order big-endian 以大端格式存储
#pragma scalar_storage_order little-endian 以小端存储
#pragma scalar_storage_order default 将其存储在您的计算机默认字节序中

阅读更多内容 此处

Try using
#pragma scalar_storage_order big-endian to store in big-endian-format
#pragma scalar_storage_order little-endian to store in little-endian
#pragma scalar_storage_order default to store it in your machines default endianness

Read more here

锦欢 2024-12-01 05:28:55

不,我不这么认为。

Endianness 是处理器的属性,指示整数是从左到右还是从右到左表示,它不是编译器的属性。

您能做的最好的事情就是编写独立于任何字节顺序的代码。

No, I dont think so.

Endianness is the attribute of processor that indicates whether integers are represented from left to right or right to left it is not an attribute of the compiler.

The best you can do is write code which is independent of any byte order.

相思碎 2024-12-01 05:28:55

不,没有这样的能力。如果它存在,可能会导致编译器必须生成过多/低效的代码,因此 C++ 不支持它。

处理序列化的常用 C++ 方法(我认为这是您要解决的问题)是让结构体以所需的确切布局保留在内存中,并以在反序列化时保留字节序的方式进行序列化。

No, there's no such capability. If it existed that could cause compilers to have to generate excessive/inefficient code so C++ just doesn't support it.

The usual C++ way to deal with serialization (which I assume is what you're trying to solve) this is to let the struct remain in memory in the exact layout desired and do the serialization in such a way that endianness is preserved upon deserialization.

不可一世的女人 2024-12-01 05:28:55

我不确定是否可以修改以下内容以满足您的目的,但在我工作的地方,我们发现以下内容在许多情况下非常有用。

当字节顺序很重要时,我们使用两种不同的数据结构。其中之一是为了代表它预计如何到达。另一个是我们希望它如何在内存中表示。然后开发转换例程以在两者之间进行切换。

工作流程的运行方式如下...

  1. 将数据读入原始结构。
  2. 将“原始结构”转换为“内存版本”
  3. 仅对“内存版本”进行操作
  4. 当对其进行操作后,将“内存版本”转换回“原始结构”并将其写出。

我们发现这种解耦很有用,因为(但不限于)...

  1. 所有转换仅位于一个地方。
  2. 使用“内存版本”时,减少了有关内存对齐问题的头痛。
  3. 它使得从一个拱门到另一个拱门的移植变得更加容易(更少的字节序问题)。

希望这种解耦对您的应用程序也有用。

I am not sure if the following can be modified to suit your purposes, but where I work, we have found the following to be quite useful in many cases.

When endianness is important, we use two different data structures. One is done to represent how it expected to arrive. The other is how we want it to be represented in memory. Conversion routines are then developed to switch between the two.

The workflow operates thusly ...

  1. Read the data into the raw structure.
  2. Convert to the "raw structure" to the "in memory version"
  3. Operate only on the "in memory version"
  4. When done operating on it, convert the "in memory version" back to the "raw structure" and write it out.

We find this decoupling useful because (but not limited to) ...

  1. All conversions are located in one place only.
  2. Fewer headaches about memory alignment issues when working with the "in memory version".
  3. It makes porting from one arch to another much easier (fewer endian issues).

Hopefully this decoupling can be useful to your application too.

甜嗑 2024-12-01 05:28:55

一个可能的创新解决方案是使用 C 解释器,例如 Ch 并强制使用字节序编码到大。

A possible innovative solution would be to use a C interpreter like Ch and force the endian coding to big.

芯好空 2024-12-01 05:28:55

Boost 为此提供了endian 缓冲区

例如:

#include <boost/endian/buffers.hpp>
#include <boost/static_assert.hpp>

using namespace boost::endian;

struct header {
    big_int32_buf_t     file_code;
    big_int32_buf_t     file_length;
    little_int32_buf_t  version;
    little_int32_buf_t  shape_type;
};
BOOST_STATIC_ASSERT(sizeof(h) == 16U);

Boost provides endian buffers for this.

For example:

#include <boost/endian/buffers.hpp>
#include <boost/static_assert.hpp>

using namespace boost::endian;

struct header {
    big_int32_buf_t     file_code;
    big_int32_buf_t     file_length;
    little_int32_buf_t  version;
    little_int32_buf_t  shape_type;
};
BOOST_STATIC_ASSERT(sizeof(h) == 16U);
夜空下最亮的亮点 2024-12-01 05:28:55

也许不是直接答案,但仔细阅读这个问题可以希望能解答您的一些疑虑。

Maybe not a direct answer, but having a read through this question can hopefully answer some of your concerns.

酷到爆炸 2024-12-01 05:28:55

您可以将该结构设为一个具有数据成员的 getter 和 setter 的类。 getter 和 setter 的实现方式如下:

int getSomeValue( void ) const {
#if defined( BIG_ENDIAN )
    return _value;
#else
    return convert_to_little_endian( _value );
#endif
}

void setSomeValue( int newValue) {
#if defined( BIG_ENDIAN )
    _value = newValue;
#else
    _value = convert_to_big_endian( newValue );
#endif
}

有时,当我们从文件中读取结构时,我们会这样做 - 我们将其读入结构并在大端和小端机器上使用它来正确访问数据。

You could make the structure a class with getters and setters for the data members. The getters and setters are implemented with something like:

int getSomeValue( void ) const {
#if defined( BIG_ENDIAN )
    return _value;
#else
    return convert_to_little_endian( _value );
#endif
}

void setSomeValue( int newValue) {
#if defined( BIG_ENDIAN )
    _value = newValue;
#else
    _value = convert_to_big_endian( newValue );
#endif
}

We do this sometimes when we read a structure in from a file - we read it into a struct and use this on both big-endian and little-endian machines to access the data properly.

囚我心虐我身 2024-12-01 05:28:55

有一种称为 XDR 的数据表示形式。看看吧。
http://en.wikipedia.org/wiki/External_Data_Representation

虽然可能有点太多了为您的嵌入式系统。尝试搜索您可以使用的已实现的库(检查许可证限制!)。

XDR 通常用于网络系统,因为它们需要一种以独立字节序的方式移动数据的方法。虽然没有说它不能在网络之外使用。

There is a data representation for this called XDR. Have a look at it.
http://en.wikipedia.org/wiki/External_Data_Representation

Though it might be a little too much for your Embedded System. Try searching for an already implemented library that you can use (check license restrictions!).

XDR is generally used in Network systems, since they need a way to move data in an Endianness independent way. Though nothing says that it cannot be used outside of networks.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文