GCC（不是 clang）如何进行这种优化，决定一个结构成员的存储不会影响另一个结构成员的成员？

发布于 2025-01-11 11:34:21 字数 1094 浏览 4 评论 0原文

这是有问题的代码：

struct Cell
{
    Cell* U;
    Cell* D;
    void Detach();
};

void Cell::Detach()
{
    U->D = D;
    D->U = U;
}

clang-14 -O3 生成：

mov     rax, qword ptr [rdi]         <-- rax = U
mov     rcx, qword ptr [rdi + 8]     <-- rcx = D
mov     qword ptr [rax + 8], rcx     <-- U->D = D
mov     rcx, qword ptr [rdi + 8]     <-- this queries the D field again
mov     qword ptr [rcx], rax         <-- D->U = U

gcc 11.2 -O3 生成几乎相同，但遗漏了一个 mov：

mov     rdx, QWORD PTR [rdi]
mov     rax, QWORD PTR [rdi+8]
mov     QWORD PTR [rdx+8], rax
mov     QWORD PTR [rax], rdx

Clang 读取 D 字段两次，而 GCC 只读取一次重新使用它。显然，GCC 并不担心第一个作业会更改任何对第二个作业有影响的内容。我试图了解是否/何时允许这样做。

当 U 或 D 指向自己、彼此和/或同一目标时，检查正确性会变得有点复杂。

我的理解是，如果保证指针指向 Cell 的开头（绝不在其内部），无论它是哪个 Cell，GCC 的较短代码都是正确的。

进一步沿着这个思路，当 a) 单元格总是与其大小对齐，并且 b) 没有发生对此类指针的自定义操作（引用和算术都很好）时，就会出现这种情况。我怀疑情况 a) 是由编译器保证的，而情况 b) 需要调用某种未定义的行为，因此可以忽略。这可以解释为什么 GCC 允许自己进行这种优化。

我的推理正确吗？如果是这样，为什么 clang 不做同样的优化呢？

原文

This is the code in question:

struct Cell
{
    Cell* U;
    Cell* D;
    void Detach();
};

void Cell::Detach()
{
    U->D = D;
    D->U = U;
}

clang-14 -O3 produces:

mov     rax, qword ptr [rdi]         <-- rax = U
mov     rcx, qword ptr [rdi + 8]     <-- rcx = D
mov     qword ptr [rax + 8], rcx     <-- U->D = D
mov     rcx, qword ptr [rdi + 8]     <-- this queries the D field again
mov     qword ptr [rcx], rax         <-- D->U = U

gcc 11.2 -O3 produces almost the same, but leaves out one mov:

mov     rdx, QWORD PTR [rdi]
mov     rax, QWORD PTR [rdi+8]
mov     QWORD PTR [rdx+8], rax
mov     QWORD PTR [rax], rdx

Clang reads the D field twice, while GCC reads it only once and re-uses it. Apparently GCC is not afraid of the first assignment changing anything that has an impact on the second assignment. I'm trying to understand if/when this is allowed.

Checking correctness gets a bit complicated when U or D point at themselves, each other and/or the same target.

My understanding is that the shorter code of GCC is correct if it is guaranteed that the pointers point at the beginning of a Cell (never inside it), regardless of which Cell it is.

Following this line of thought further, this is the case when a) Cells are always aligned to their size, and b) no custom manipulation of such a pointer occurs (referencing and arithmetic are fine).
I suspect case a) is guaranteed by the compiler, and case b) would require invoking undefined behavior of some sort, and as such can be ignored.
This would explain why GCC allows itself this optimization.

Is my reasoning correct? If so, why does clang not make the same optimization?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

半透明的墙 2025-01-18 11:34:21

C 和 C++ 中有许多潜在的优化通常是安全的，但并不十分健全。如果人们认为 -> 运算符可用于构建标准布局对象，而不必首先在其上使用放置 new （一种被大量代码依赖的抽象模型，无论是不是标准强制支持），删除以下 C 和 C++ 函数中的 if (mode) 就是这样的优化。

C 版本：

struct s { int x,y; }; /* Assume int is 4 bytes, and struct is 8 */

void test(struct s *p1, struct s *p2, int mode)
{
    p1->y = 1;
    p2->x = 2;
    if (mode)
        p1->y = 1;            
}

C++ 版本：

#include <new>
struct s { int x,y; };
void test(void *vp1, void *vp2, int mode)
{
    if (1)
    {
        struct s* p1 = new (vp1) struct s;
        p1->x = 1;            
    }
    if (1)
    {
        struct s* p2 = new (vp2) struct s;
        p2->y = 2;
    }
    if (mode)
    {
        struct s* p3 = new (vp1) struct s;
        p3->x = 1;            
    }
}

除非 p2 中的地址比 p1 高 4 个字节，否则优化是正确的。在 C 或 C++ 中使用的“传统”抽象模型下，如果 p1 的地址恰好是 0x1000 并且 p2 的地址恰好是0x1004，第一个赋值将导致地址 0x1000-0x1007 保存一个 struct s（如果还没有的话），其第二个成员（位于地址 0x1004) 将等于 1。第二个赋值，通过覆盖该对象，将结束其生命周期，并导致地址 0x1004 到 0x100B 保存一个 struct s，其第一个成员等于 2。第三个赋值，< em>如果执行，将结束第二个对象的生命周期并重新创建第一个对象。

如果执行第三次赋值，则地址 0x1000 处将有一个对象，其第二个字段（地址 0x1004 处）将保存可读值 1。如果跳过该赋值，则地址 0x1004 处将有一个对象，其第一个字段将保存可读值 1。值 2。行为将在两种情况下定义，并且不知道将应用哪种情况的编译器必须通过使 0x1004 处的值依赖于来适应这两种情况模式。

碰巧的是，clang 的作者似乎没有提供这种特殊情况，因此省略了条件检查。虽然我认为标准应该使用允许这种优化的抽象模型，同时在不涉及奇怪的别名极端情况的情况下也支持通用的结构创建模式，但我没有看到任何解释标准的方法允许进行此类优化，而不允许编译器任意破坏大量现有代码。

我不认为有任何通用的方法可以知道 gcc 或 clang 不实施特定优化的决定何时代表对优化不正确的潜在极端情况的认识，并且无法证明它们都不适用，并且当它仅仅代表一种疏忽时，可以“纠正”以用不健全的优化代替正确的行为。

There are many potential optimizations in C and C++ that are usually safe, but aren't quite sound. If one regards the -> operator as being usable to build a standard-layout object without having to use placement new on it first (an abstraction model that is relied upon by a lot of code, whether or not the Standard mandates support), removing the if (mode) in the following C and C++ funcitons would be such an optimization.

C version:

struct s { int x,y; }; /* Assume int is 4 bytes, and struct is 8 */

void test(struct s *p1, struct s *p2, int mode)
{
    p1->y = 1;
    p2->x = 2;
    if (mode)
        p1->y = 1;            
}

C++ version:

#include <new>
struct s { int x,y; };
void test(void *vp1, void *vp2, int mode)
{
    if (1)
    {
        struct s* p1 = new (vp1) struct s;
        p1->x = 1;            
    }
    if (1)
    {
        struct s* p2 = new (vp2) struct s;
        p2->y = 2;
    }
    if (mode)
    {
        struct s* p3 = new (vp1) struct s;
        p3->x = 1;            
    }
}

The optimization would be correct unless the address in p2 is four bytes higher than p1. Under the "traditional" abstraction model used in C or C++, if the address of p1 happens to be 0x1000 and that of p2 happens to be 0x1004, the first assignment would cause addresses 0x1000-0x1007 to hold a struct s, if it didn't already, whose second member (at address 0x1004) would equal 1. The second assignment, by overwriting that object, would end its lifetime and cause addresses 0x1004 to 0x100B to hold a struct s whose first member would equal 2. The third assignment, if executed, would end the lifetime of that second object and re-create the first.

If the third assignment is executed, there would be an object at address 0x1000 whose second field (at address 0x1004) would hold the readable value 1. If the assignment is skipped, there would be an object at address 0x1004 whose first field would hold the value 2. Behavior would be defined in both cases, and a compiler that didn't know which case would apply would have to accommodate both of them by making the value at 0x1004 depend upon mode.

As it happens, the authors of clang do not seem to have provided for that corner case, and thus omit the conditional check. While I think the Standard should use an abstraction model that would allow such optimization, while also supporting the common structure-creation pattern in situations that don't involve weird aliasing corner cases, I don't see any way of interpreting the Standard that would allow for such optimization without allowing compilers to arbitrarily break a large amount of existing code.

I don't think there's any general way of knowing when a decision by gcc or clang not to impose a particular optimization represents a recognition of potential corner cases where the optimization would be incorrect, and an inability to prove that none of them apply, and when it simply represents an oversight which may be "corrected" to as to replace correct behavior with an unsound optimization.

回复收藏 0 原文

~没有更多了~