C 联合类型双关数组

发布于 2025-01-14 10:31:44 字数 829 浏览 4 评论 0原文

鉴于以下代码,我有一些与类型双关相关的问题。我看不出这没有违反严格的别名规则,但我无法指出具体的违规行为。我最好的猜测是,将联合成员传递到函数中违反了严格的别名。

以下代码位于编译器资源管理器上。

#include <stdint.h>

union my_type
{
    uint8_t m8[8];
    uint16_t m16[4];
    uint32_t m32[2];
    uint64_t m64;
};

int func(uint16_t *x, uint32_t *y)
{
    return *y += *x;
}

int main(int argc, char *argv[])
{
    union my_type mine = {.m64 = 1234567890};
    return func(mine.m16, mine.m32);
}

我的观察:

  • 假设 func 的参数不互相别名,则 func 不违反严格别名。
  • 在 C 中,允许使用union 进行类型双关。
  • m16m32 传递到 func 必须违反某些规定。

我的问题:

  • 像这样的数组类型双关有效吗?
  • 我通过将指针传递给 func 到底违反了什么?
  • 在这个例子中我还遗漏了哪些其他问题?

Given the following code, I have some questions related to type punning. I do not see any way that this isn't violating strict aliasing rules, but I cannot point to the specific violation. My best guess is that passing the union members into the function violates strict aliasing.

The following code is on Compiler Explorer.

#include <stdint.h>

union my_type
{
    uint8_t m8[8];
    uint16_t m16[4];
    uint32_t m32[2];
    uint64_t m64;
};

int func(uint16_t *x, uint32_t *y)
{
    return *y += *x;
}

int main(int argc, char *argv[])
{
    union my_type mine = {.m64 = 1234567890};
    return func(mine.m16, mine.m32);
}

My observations:

  • Assuming the arguments to func do not alias each other, func does not violate strict aliasing.
  • In C, it is permissible to use a union for type punning.
  • Passing m16 and m32 into func must violate something.

My questions:

  • Is type punning with arrays like this valid?
  • What exactly am I violating by passing the pointers into func?
  • What other gotchas am I missing in this example?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

暗恋未遂 2025-01-21 10:31:44

违反的规则是 C 2018 6.5.16.1 3:

如果从与第一个对象的存储以任何方式重叠的另一个对象读取存储在对象中的值,则重叠应是精确的,并且两个对象应具有兼容类型的合格或不合格版本;否则,行为未定义。

具体来说,在*y += *x中,存储在y指向的对象mine.m16中的值是从另一个对象 mine.m32mine.m16 的存储重叠,但重叠并不精确,而且无论限定符如何,这些对象也不具有兼容的类型。

请注意,此规则适用于简单分配,如 E1 = E2 中所示,而代码具有组件分配,E1 += E2。然而,复合赋值 E1 += E2 在 6.5.16.2 3 中定义为等价于 E1 = E1 + E2,除了左值 E1 > 仅评估一次。

像这样的数组的类型双关有效吗?

是的,C 标准允许通过联合成员使用别名;读取最后存储的成员以外的成员将重新解释新类型中的字节。然而,如果程序的行为是由 C 标准定义的,特别是上面引用的规则,这并不能免除程序遵守其他规则的责任。

将指针传递给 func 到底违反了什么?

传递指针不会违反任何规则。正如上面所回答的,使用指针的赋值违反了规则。

在此示例中我还缺少哪些其他问题?

如果我们更改 func:

int func(uint16_t *x, uint32_t *y)
{
    *y += 1;
    *x += 1;
    return *y;
}

则 6.5.16.1 3 中的规则不适用,因为不存在涉及重叠对象的赋值。并且不违反 6.5 7 中的别名规则,因为 *y 是一个定义为用于访问它的类型的对象,uint16_t*x 是一个对象,定义为用于访问它的类型,uint32_t。然而,如果这个函数被单独翻译(没有可见的union定义),编译器可以假设*x和*y执行不重叠,因此它可能会缓存由 *y += 1; 生成的 *y 值并返回该缓存值,而忽略 * x += 1; 变化<代码>*y。这是C标准的一个缺陷。

The rule violated is C 2018 6.5.16.1 3:

If the value being stored in an object is read from another object that overlaps in any way the storage of the first object, then the overlap shall be exact and the two objects shall have qualified or unqualified versions of a compatible type; otherwise, the behavior is undefined.

Specifically, in *y += *x, the value being stored in the object pointed to by y, mine.m16, is read from another object, mine.m32, that overlaps the storage of mine.m16, but the overlap is not exact and neither do the objects have compatible types, regardless of qualifiers.

Note that this rule is for simple assignment, as in E1 = E2, whereas the code has a component assignment, E1 += E2. However, the compound assignment E1 += E2 is defined in 6.5.16.2 3 to be equivalent to E1 = E1 + E2 except that the lvalue E1 is evaluated only once.

Is type punning with arrays like this valid?

Yes, the C standard allows aliasing via union members; reading a member other than the last one stored will reinterpret the bytes in the new type. However, this does not absolve a program of conforming to other rules if its behavior is to be defined by the C standard, notably the rule quoted above.

What exactly am I violating by passing the pointers into func?

No rule is violated by passing the pointers. The assignment using the pointers violates a rule, as answered above.

What other gotchas am I missing in this example?

If we change func:

int func(uint16_t *x, uint32_t *y)
{
    *y += 1;
    *x += 1;
    return *y;
}

then the rule in 6.5.16.1 3 does not apply, as there is no assignment involving overlapping objects. And the aliasing rules in 6.5 7 are not violated, as *y is an object defined as the type used to access it, uint16_t, and *x is an object defined as the type used to access it, uint32_t. Yet, if this function is translated in isolation (without the union definition visible), the compiler is permitted to assume *x and *y do not overlap, so it may cache the value of *y produced by *y += 1; and return that cached value, in ignorance of the fact that *x += 1; changes *y. This is a defect in the C standard.

空‖城人不在 2025-01-21 10:31:44

将 m16 和 m32 传递给 func 必须违反某些规定。

func(uint16_t *x, uint32_t *y) 可以自由地假设 *x*y 不重叠,因为 x, y 是足够不同的指针类型。由于引用的数据在OP的代码中确实重叠,所以我们遇到了问题。

有关 union 和别名的特殊问题在 func() 主体中不适用,因为调用代码的union-ness 是丢失的。

替代的“安全”代码可能是:

// Use volatile to prevent folding these 2 lines of code.
// The key is that even with optimized code, 
// the sum must be done before *y assignment.
volatile uint32_t sum = *y + *x;
*y = sum;

return (int) (*y);

将指针传递给 func 到底违反了什么?

将指针传递给函数 func() 不必考虑的重叠数据。


像这样的数组的类型双关有效吗?

我不认为这是一个数组或联合问题,只是将指针传递给函数 func() 没有义务考虑的重叠数据之一。

在此示例中我还缺少哪些其他问题?

次要问题:int 可能是 16 位,可能会导致在 uint32_tint 转换中出现实现定义的行为。


考虑 fun1() 之间的差异

uint32_t fun1(uint32_t *a, uint32_t *b);
uint32_t fun2(uint32_t * restrict a, uint32_t * restrict b);

就必须考虑重叠的可能性。 fun2() 不会。

Passing m16 and m32 into func must violate something.

func(uint16_t *x, uint32_t *y) is free to assume *x and *y do not overlap as x, y are different enough pointer types. Since the referenced data does overlap in OP's code, we have a problem.

The special issues about unions and aliasing do not apply here in the body of func() as the union-ness of the calling code is lost.

Alternate "safe" code could have been:

// Use volatile to prevent folding these 2 lines of code.
// The key is that even with optimized code, 
// the sum must be done before *y assignment.
volatile uint32_t sum = *y + *x;
*y = sum;

return (int) (*y);

What exactly am I violating by passing the pointers into func?

Passing pointers to overlapping data that the function func() is not obliged to account for.


Is type punning with arrays like this valid?

I do not see this as an array or union issue, just one of passing pointers to overlapping data that the function func() is not obliged to account for.

What other gotchas am I missing in this example?

Minor: int may be 16-bit, potentially causing implementation defined behavior in the conversion of uint32_t to int.


Consider the difference between

uint32_t fun1(uint32_t *a, uint32_t *b);
uint32_t fun2(uint32_t * restrict a, uint32_t * restrict b);

fun1() would have to consider an overlap potential. fun2() would not.

血之狂魔 2025-01-21 10:31:44

我的观察:

  • 假设 func 的参数不互相别名,func 不会违反严格别名。

不可靠。所谓的严格别名规则是根据用于访问给定对象的左值(相对于该对象的有效类型)来表达的。 func() 的两个参数不需要互相别名来执行 func() 来产生严格别名冲突。示例:

uint32_t x = 0, y = 1;
func((uint16_t *)&x, &y);
// func will violate strict aliasing when it dereferences its first parameter

围绕函数参数相互别名的问题将属于您没有使用的 restrict 限定指针的范围。


  • 在 C 语言中,允许使用联合进行类型双关。

是的,只要双关是通过 union 对象执行的。 C17 6.5/7(上述严格别名规则)涵盖了这一点:

对象的存储值只能由具有以下类型之一的左值表达式访问:

[...]

  • 在其成员中包含上述类型之一的聚合或联合类型

请注意,这与被访问的存储实际上是在联合对象内部,而是关于用于访问它的左值类型相对于被访问对象的实际(有效)类型。


  • 将 m16 和 m32 传递给 func 必须违反某些规定。

确实如此,尽管语言规范可能比实际情况更清楚。然而,它确实说:

任何时候最多可以将一个成员的值存储在联合对象中。

(C17 6.7.2.1/16)

在您的特定示例中, mine.m16mine.m32 都没有调用时存储在其中的值,但在任何情况下,最多其中一个可以有值。当 func 尝试读取存储在这些对象中的值时,结果未定义(因为它们实际上没有存储值)。

规范中包含的第 6.5.2.3/6 段支持了这种解释:

为了简化联合的使用,做出了一项特殊保证:
如果联合包含多个共享共同首字母的结构
序列(见下文),并且如果联合对象当前包含一个
这些结构中,允许检查共同的初始
其中任何一个完整类型声明的一部分
工会的成员是可见的。

如果通常可以访问随机联合成员,而不管哪个成员实际上存储了值,则不需要这样的特殊规定。


我的问题:

  • 像这样的数组类型双关有效吗?

不是那样的,不。规范允许的数组类型双关还有其他变体。


  • 将指针传递给 func 到底违反了什么?

调用本身并没有违反任何内容。获取联合成员的地址是合法的,即使当前没有存储值,并且将结果指针值传递给函数也是合法的。但是,当使用这些参数调用时,函数在尝试取消引用一个或两个指针时会犯严格别名冲突,如上所述。


  • 在此示例中我还缺少哪些其他问题?

与您的其他答案之一相反,所提供的代码并不违反第 6.5.16.1/3 段。存储在 *y 中的值不是从重叠对象 *x 读取的,而是该值与原始值的总和<代码>*y。该总和是计算得出的,而不是从对象中读取的,因此 6.5.16.1/3 不适用。但您可能会忽略,如果 func() 执行简单的赋值而不是加号,就会违反 6.5.16.1/3

My observations:

  • Assuming the arguments to func do not alias each other, func does not violate strict aliasing.

Not reliably true. The so-called strict-aliasing rule is expressed in terms of the lvalue used to access a given object, relative to that object's effective type. The two arguments to func() do not need to alias each other for execution of func() to produce a strict-aliasing violation. Example:

uint32_t x = 0, y = 1;
func((uint16_t *)&x, &y);
// func will violate strict aliasing when it dereferences its first parameter

Issues revolving around function parameters aliasing each other would be the realm of restrict-qualified pointers, which you are not using.


  • In C, it is permissible to use a union for type punning.

Yes, provided that the punning is performed via the union object. This is covered by C17 6.5/7, the aforementioned strict-aliasing rule:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

[...]

  • an aggregate or union type that includes one of the aforementioned types among its members

Note well that this isn't about the storage being accessed actually being inside a union object, but rather about the type of lvalue used to access it relative to the actual (effective) type of the object being accessed.


  • Passing m16 and m32 into func must violate something.

It does, though the language specification could be a lot clearer about that than it is. It does, however, say:

The value of at most one of the members can be stored in a union object at any time.

(C17 6.7.2.1/16)

In your particular example, neither mine.m16 nor mine.m32 has a value stored in it at the time of the call, but under any circumstance, at most one of them could have a value. When func then tries to read the values stored in those objects the results are not defined (because they don't actually have values stored in them).

That interpretation is supported by the inclusion in the spec of paragraph 6.5.2.3/6:

One special guarantee is made in order to simplify the use of unions:
if a union contains several structures that share a common initial
sequence (see below), and if the union object currently contains one
of these structures, it is permitted to inspect the common initial
part of any of them anywhere that a declaration of the completed type
of the union is visible.

No such special provision would be needed if it were generally ok to access random union members regardless of which one actually had a value stored.


My questions:

  • Is type punning with arrays like this valid?

Not like that, no. There are other, variations on array type-punning that are allowed by the spec.


  • What exactly am I violating by passing the pointers into func?

The call itself does not violate anything. It is legal to take the address of a union member, even one that does not currently have a value stored in it, and it is legal to pass the resulting pointer values to functions. But when called with those arguments, the function commits strict-aliasing violations when it attempts to dereference one or both pointers, as described above.


  • What other gotchas am I missing in this example?

Contrary to one of your other answers, the code presented does not run afoul of paragraph 6.5.16.1/3. The value being stored in *y is not read from overlapping object *x, but rather is the sum of that value with the original value of *y. That sum is computed, not read from an object, so 6.5.16.1/3 does not apply. But you may be missing that it would violate 6.5.16.1/3 if func() performed a simple assignment instead of a plussignment.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文