C99 中数据结构的效率（可能受字节序影响）

发布于 2024-10-09 03:34:51 字数 1067 浏览 10 评论 0原文

我有几个相互关联的问题。基本上，在我实现的算法中，单词 w 被定义为四个字节，因此它可以整个包含在 uint32_t 中。

然而，在算法运行过程中，我经常需要访问单词的各个部分。现在，我可以通过两种方式做到这一点：

uint32_t w = 0x11223344;
uint8_t a = (w & 0xff000000) >> 24;
uint8_t b = (w & 0x00ff0000) >> 16;
uint8_t b = (w & 0x0000ff00) >>  8;
uint8_t d = (w & 0x000000ff);

但是，我的一部分认为这不是特别有效。我认为更好的方法是使用联合表示，如下所示：

typedef union
{
    struct
    {
        uint8_t d;
        uint8_t c;
        uint8_t b;
        uint8_t a;
    };
    uint32_t n;
} word32;

使用此方法我可以分配 word32 w = 0x11223344; 然后我可以访问各种我需要的部分（wa=11 小端）。

然而，在这个阶段，我遇到了字节序问题，即在大字节序系统中，我的结构定义不正确，因此我需要在传入单词之前对其进行重新排序。

我可以毫不费力地做到这一点。那么，我的问题是，与使用联合的实现相比，第一部分（各种按位与和移位）是否有效？两者一般有什么区别吗？在现代 x86_64 处理器上我应该走哪条路？字节序在这里只是转移注意力吗？

我当然可以检查汇编输出，但我对编译器的了解并不出色。我本以为联合会更有效，因为它本质上会转换为内存偏移量，如下所示：

mov eax, [r9+8]

编译器会意识到上面的位移情况中发生了什么吗？

如果重要的话，我正在使用 C99，特别是我的编译器是 clang (llvm)。

提前致谢。

原文

I have a couple of questions that are all inter-related. Basically, in the algorithm I am implementing a word w is defined as four bytes, so it can be contained whole in a uint32_t.

However, during the operation of the algorithm I often need to access the various parts of the word. Now, I can do this in two ways:

uint32_t w = 0x11223344;
uint8_t a = (w & 0xff000000) >> 24;
uint8_t b = (w & 0x00ff0000) >> 16;
uint8_t b = (w & 0x0000ff00) >>  8;
uint8_t d = (w & 0x000000ff);

However, part of me thinks that isn't particularly efficient. I thought a better way would be to use union representation like so:

typedef union
{
    struct
    {
        uint8_t d;
        uint8_t c;
        uint8_t b;
        uint8_t a;
    };
    uint32_t n;
} word32;

Using this method I can assign word32 w = 0x11223344; then I can access the various
parts as I require (w.a=11 in little endian).

However, at this stage I come up against endianness issues, namely, in big endian systems my struct is defined incorrectly so I need to re-order the word prior to it being passed in.

This I can do without too much difficulty. My question is, then, is the first part (various bitwise ands and shifts) efficient compared to the implementation using a union? Is there any difference between the two generally? Which way should I go on a modern, x86_64 processor? Is endianness just a red herring here?

I could inspect the assembly output of course, but my knowledge of compilers is not brilliant. I would have thought a union would be more efficient as it would essentially convert to memory offsets, like so:

mov eax, [r9+8]

Would a compiler realise that is what happening in the bit-shift case above?

If it matters, I'm using C99, specifically my compiler is clang (llvm).

Thanks in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凉城 2024-10-16 03:34:52

鉴于使用移位和掩码访问位是一种常见操作，我希望编译器对此非常聪明，特别是如果您使用恒定的移位计数和掩码。

一种选择是使用宏进行位设置/获取，这样如果在特定平台上编译器恰好处于愚蠢的一面，您可以在配置时选择最佳策略（并且明智地选择宏名称也可以使代码更清晰和自我解释）。

回复收藏 0 原文

心的位置 2024-10-16 03:34:51

如果您需要 AES，为什么不使用现有的实现呢？这对于具有 AES 硬件支持的现代英特尔处理器尤其有利。

由于存储到加载转发 (STLF) 失败，联合技巧可能会减慢速度。如果您将数据写入内存并以不同的数据类型（例如 32 位与 8 位）将其读回，则可能会发生这种情况，具体取决于处理器型号。

回复收藏 0 原文

高冷爸爸 2024-10-16 03:34:51

如果无法检查代码中这些操作的实际用途，很难判断这样的事情：

shift 版本可能会这样做
如果你碰巧拥有你所有的，那就更好了
无论如何，寄存器中的变量，以及
然后你进行大量计算
他们。通常编译器（包括 clang）在为部分单词和类似的东西发出指令方面相对聪明。
联盟版本也许是
如果您必须加载，效率会更高
内存中的大部分字节
无论如何

，我会将访问操作抽象为宏，以便您可以轻松修改它，从而获得工作代码。

就我个人的口味而言，我会选择移位版本，因为它在概念上更简单，并且只有当我发现最终生成的汇编程序看起来不令人满意时才选择union。

回复收藏 0 原文

感情旳空白 2024-10-16 03:34:51

我猜想使用联合可能会更有效。当然，编译器可能能够优化字节加载的移位，因为它们在编译期间是已知的——在这种情况下，两种方案将产生相同的代码。

另一种选择（也依赖于字节顺序）是将字转换为字节数组并直接访问字节。即，类似于下面的内容，

uint8_t b = ((uint8_t*)w)[n]

但我不确定您会在真正的现代 32/64 位处理器上看到任何差异。

编辑：看起来 clang 在这两种情况下都会生成相同的代码。

I would guess using a union may be more efficient. Of course, the compiler may be able to optimize the shifts into byte loads since they are known during compilation -- in which case both schemes will yield identical code.

Another option (also byte order dependent) is to cast the word to a byte array and access the bytes directly. I.e., something like the following