为什么这种不良通用初始化器语法编译并导致不可预测的行为？

发布于 2025-02-03 14:35:39 字数 1716 浏览 3 评论 0原文

我有一堆用于使用硬件（FPGA）寄存器的代码，该寄存器的形式大致是：（

struct SomeRegFields {
    unsigned int lower : 16;
    unsigned int upper : 16;
};

union SomeReg {
    uint32_t wholeReg;
    SomeRegFields fields;
};

这些寄存器类型

中的大多数都更复杂。这是说明性的。以下方式：

SomeReg reg1;
reg1.wholeReg = 0;
// ... assign individual fields
card->writeReg(REG1_ADDRESS, reg1.wholeReg);

SomeReg reg2;
reg2.wholeReg = card->readReg(REG2_ADDRESS);
// ... do something with reg2 field values

我有点缺席，意外地获得了以下内容：

SomeReg reg1{ reg1.wholeReg = 0 };
SomeReg reg2{ reg2.wholeReg = card->readReg(REG2_ADDRESS) };

reg1.wholereg =零件当然是错误的，应删除。

令我困扰的是，此在MSVC和GCC上都编译。我本来可以在这里期待语法错误。此外，有时它可以正常工作，并且该值实际上正确地复制/分配了，但是有时，即使返回的寄存器值为non-0，也会导致0值。这是不可预测的，但是在哪些情况下行之有效的情况下似乎是一致的。

知道为什么编译器不将其标记为不良语法，以及为什么在某些情况下似乎有效，而是在其他情况下崩溃？我认为这当然是不确定的行为，但是为什么它会在通常背靠背的呼叫几乎是几乎相同的呼叫之间更改的行为呢？

一些汇编信息：

如果我通过编译器Explorer ：

int main()
{
    SomeReg myReg { myReg.wholeReg = 10 };
    return myReg.fields.upper;
}

这是代码GCC TRUNK吐出的主要代码通过优化OFF（-O0）：

main:
    push    rbp
    mov     rbp, rsp
    mov     DWORD PTR [rbp-4], 10
*   mov     eax, DWORD PTR [rbp-4]
*   mov     DWORD PTR [rbp-4], eax
    movzx   eax, WORD PTR [rbp-2]
    movzx   eax, ax
    pop     rbp
    ret

标记*的行是此版本与没有BAD myReg.wholereg = 部分。 MSVC给出了类似的结果，尽管即使进行了优化，但它似乎也在做一些。在这种情况下，它只是在寄存器中和退出登记册中导致额外的任务，因此它仍然可以按预期工作，但是鉴于我的意外实验结果，它不得在更复杂的情况下以这种方式进行编译，即不从编译中分配。 - 时值值。

原文

I have a bunch of code for working with hardware (FPGA) registers, which is roughly of the form:

struct SomeRegFields {
    unsigned int lower : 16;
    unsigned int upper : 16;
};

union SomeReg {
    uint32_t wholeReg;
    SomeRegFields fields;
};

(Most of these register types are more complex. This is illustrative.)

While cleaning up a bunch of code that set up registers in the following way:

SomeReg reg1;
reg1.wholeReg = 0;
// ... assign individual fields
card->writeReg(REG1_ADDRESS, reg1.wholeReg);

SomeReg reg2;
reg2.wholeReg = card->readReg(REG2_ADDRESS);
// ... do something with reg2 field values

I got a bit absent-minded and accidentally ended up with the following:

SomeReg reg1{ reg1.wholeReg = 0 };
SomeReg reg2{ reg2.wholeReg = card->readReg(REG2_ADDRESS) };

The reg1.wholeReg = part is wrong, of course, and should be removed.

What's bugging me is that this compiles on both MSVC and GCC. I would have expected a syntax error here. Moreover, sometimes it works fine and the value actually gets copied/assigned correctly, but other times, it will result in a 0 value even if the register value returned is non-0. It's unpredictable, but appears to be consistent between runs which cases work and which don't.

Any idea why the compilers don't flag this as bad syntax, and why it seems to work in some cases but breaks in others? I assume this is undefined behavior, of course, but why would it would change behaviors between what often seem like nearly identical calls, often back-to-back?

Some compilation info:

If I run this through Compiler Explorer:

int main()
{
    SomeReg myReg { myReg.wholeReg = 10 };
    return myReg.fields.upper;
}

This is the code GCC trunk spits out for main with optimization off (-O0):

main:
    push    rbp
    mov     rbp, rsp
    mov     DWORD PTR [rbp-4], 10
*   mov     eax, DWORD PTR [rbp-4]
*   mov     DWORD PTR [rbp-4], eax
    movzx   eax, WORD PTR [rbp-2]
    movzx   eax, ax
    pop     rbp
    ret

The lines marked with * are the only difference between this version and a version without the bad myReg.wholeReg = part. MSVC gives similar results, though even with optimization off, it seems to be doing some. In this case, it just causes an extra assignment in and back out of a register, so it still works as intended, but given my accidental experimental results, it must not always compile this way in more complex cases, i.e. not assigning from a compile-time-deducible value.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

悲凉≈ 2025-02-10 14:35:39

reg1.wholeReg = card->readReg(REG2_ADDRESS)

这只是将其视为一种表达。您是将card-＆gt; readReg（reg2_address）的返回值分配给reg1.Wholereg，然后使用此表达式的结果（lvalue涉及> reg1.wholereg）要汇总reg2的第一个成员（即reg2.wholereg2.wholereg）。之后reg1和reg2应保持相同的值，即功能的返回值。

从句法上讲，在这里也是如此

SomeReg reg1{ reg1.wholeReg = 0 };

，这是技术上不确定的行为，因为您不允许您在初始化之前访问变量或类成员。实际上，我希望这种情况通常会起作用，但是，初始化reg1.wholereg 0，然后再次。

在其自身的初始化器中提到变量在句法上是正确的，有时可能是有用的（例如，将指针传递给变量本身）。这就是为什么没有汇编错误的原因。

int main()
{
    SomeReg myReg { myReg.wholeReg = 10 };
    return myReg.fields.upper;
}

即使您修复了初始化，这也具有其他未定义的行为，因为您无法将C ++中的联合用于类型的双关。这总是不确定的行为，尽管有些编译器可能会允许其达到C中允许的程度，但标准不允许阅读fields.upper如果wholereg是联盟的活跃成员（这意味着分配值的最后一个成员）。

reg1.wholeReg = card->readReg(REG2_ADDRESS)

This is simply treated as an expression. You are assigning the return value of card->readReg(REG2_ADDRESS) to reg1.wholeReg and then you use the result of this expression (a lvalue referring to reg1.wholeReg) to aggregate-initialize the first member of reg2 (i.e. reg2.wholeReg). Afterwards reg1 and reg2 should hold the same value, the return value of the function.

Syntactically the same happens in

SomeReg reg1{ reg1.wholeReg = 0 };

However, here it is technically undefined behavior since you are not allowed to access variables or class members before they are initialized. Practically speaking, I would expect this to usually work nontheless, initializing reg1.wholeReg to 0 and then once again.

Referring to a variable in its own initializer is syntactically correct and may sometimes be useful (e.g. to pass a pointer to the variable itself). This is why there is no compilation error.

int main()
{
    SomeReg myReg { myReg.wholeReg = 10 };
    return myReg.fields.upper;
}

This has additional undefined behavior, even if you fix the initialization, because you can't use a union in C++ for type punning at all. That is always undefined behavior, although some compilers might allow it to the degree that is allowed in C. Still, the standard does not allow reading fields.upper if wholeReg is the active member of the union (meaning the last member to which a value was assigned).

回复收藏 0 原文

~没有更多了~