gcc、严格别名和通过联合进行转换

发布于 2024-09-03 02:28:52 字数 4588 浏览 4 评论 0原文

你有什么恐怖故事要讲吗? GCC 手册最近添加了有关 -fstrict-aliasing 和通过联合强制转换指针的警告:

[...] 获取地址、转换结果指针并取消引用结果具有未定义行为 [强调],即使转换使用联合类型,例如:

    union a_union {
        int i;
        double d;
    };

    int f() {
        double d = 3.0;
        return ((union a_union *)&d)->i;
    }

有人有例子来说明这种未定义的行为吗?

请注意,这个问题与 C99 标准所说或未说的内容无关。它是关于当今 gcc 和其他现有编译器的实际功能。

我只是猜测,但一个潜在的问题可能在于将 d 设置为 3.0。因为 d 是一个临时变量,永远不会直接读取,也永远不会通过“某种程度上兼容”的指针读取,所以编译器可能不会费心去设置它。然后 f() 将从堆栈中返回一些垃圾。

我简单、天真的尝试失败了。例如:

#include <stdio.h>

union a_union {
    int i;
    double d;
};

int f1(void) {
    union a_union t;
    t.d = 3333333.0;
    return t.i; // gcc manual: 'type-punning is allowed, provided...' (C90 6.3.2.3)
}

int f2(void) {
    double d = 3333333.0;
    return ((union a_union *)&d)->i; // gcc manual: 'undefined behavior' 
}

int main(void) {
    printf("%d\n", f1());
    printf("%d\n", f2());
    return 0;
}

工作正常,在 CYGWIN 上给出:

-2147483648
-2147483648

查看汇编器,我们看到 gcc 完全优化了 tf1() 只是存储预先计算的答案:

movl    $-2147483648, %eax

while f2() 将 3333333.0 压入浮点堆栈,然后提取返回值:

flds   LC0                 # LC0: 1246458708 (= 3333333.0) (--> 80 bits)
fstpl  -8(%ebp)            # save in d (64 bits)
movl   -8(%ebp), %eax      # return value (32 bits)

并且函数也是内联的(这似乎是一些微妙的严格别名错误的原因)但这与这里无关。 (这个汇编器并不是那么相关,但它添加了确凿的细节。)

还要注意,获取地址显然是错误的(或者正确,如果您试图说明未定义的行为)。例如,正如我们知道这是错误的:

extern void foo(int *, double *);
union a_union t;
t.d = 3.0;
foo(&t.i, &t.d); // undefined behavior

我们同样知道这是错误的:

extern void foo(int *, double *);
double d = 3.0;
foo(&((union a_union *)&d)->i, &d); // undefined behavior

有关此问题的背景讨论,请参见示例:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1422.pdf
http://gcc.gnu.org/ml/gcc/2010-01 /msg00013.html
http://davmac.wordpress.com/2010/02/26/c99-重新审视/
http://cellperformance.beyond3d.com/articles/2006/06 /understanding-strict-aliasing.html
( = 搜索Google 上的页面,然后查看缓存页面 )

严格别名规则是什么?< /a>
C++ (GCC) 中的 C99 严格别名规则

第一个链接,七个月前 ISO 会议的草稿记录,一位与会者在第 4.16 节中指出:

有人认为规则足够明确吗?没有人能够真正解释它们。

其他说明:我的测试是使用 gcc 4.3.4,使用 -O2;选项 -O2 和 -O3 暗示 -fstrict-aliasing。 GCC 手册中的示例假设 sizeof(double) >= sizeof(int);即使它们不平等也没关系。

此外,正如 Mike Acton 在 cellperformace 链接中指出的那样,-Wstrict-aliasing=2(但不是=3)会产生警告:取消引用类型双关指针可能会违反此处示例的严格别名规则

Do you have any horror stories to tell? The GCC Manual recently added a warning regarding -fstrict-aliasing and casting a pointer through a union:

[...] Taking the address, casting the resulting pointer and dereferencing the result has undefined behavior [emphasis added], even if the cast uses a union type, e.g.:

    union a_union {
        int i;
        double d;
    };

    int f() {
        double d = 3.0;
        return ((union a_union *)&d)->i;
    }

Does anyone have an example to illustrate this undefined behavior?

Note this question is not about what the C99 standard says, or does not say. It is about the actual functioning of gcc, and other existing compilers, today.

I am only guessing, but one potential problem may lie in the setting of d to 3.0. Because d is a temporary variable which is never directly read, and which is never read via a 'somewhat-compatible' pointer, the compiler may not bother to set it. And then f() will return some garbage from the stack.

My simple, naive, attempt fails. For example:

#include <stdio.h>

union a_union {
    int i;
    double d;
};

int f1(void) {
    union a_union t;
    t.d = 3333333.0;
    return t.i; // gcc manual: 'type-punning is allowed, provided...' (C90 6.3.2.3)
}

int f2(void) {
    double d = 3333333.0;
    return ((union a_union *)&d)->i; // gcc manual: 'undefined behavior' 
}

int main(void) {
    printf("%d\n", f1());
    printf("%d\n", f2());
    return 0;
}

works fine, giving on CYGWIN:

-2147483648
-2147483648

Looking at the assembler, we see that gcc completely optimizes t away: f1() simply stores the pre-calculated answer:

movl    $-2147483648, %eax

while f2() pushes 3333333.0 onto the floating-point stack, and then extracts the return value:

flds   LC0                 # LC0: 1246458708 (= 3333333.0) (--> 80 bits)
fstpl  -8(%ebp)            # save in d (64 bits)
movl   -8(%ebp), %eax      # return value (32 bits)

And the functions are also inlined (which seems to be the cause of some subtle strict-aliasing bugs) but that is not relevant here. (And this assembler is not that relevant, but it adds corroborative detail.)

Also note that taking addresses is obviously wrong (or right, if you are trying to illustrate undefined behavior). For example, just as we know this is wrong:

extern void foo(int *, double *);
union a_union t;
t.d = 3.0;
foo(&t.i, &t.d); // undefined behavior

we likewise know this is wrong:

extern void foo(int *, double *);
double d = 3.0;
foo(&((union a_union *)&d)->i, &d); // undefined behavior

For background discussion about this, see for example:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1422.pdf
http://gcc.gnu.org/ml/gcc/2010-01/msg00013.html
http://davmac.wordpress.com/2010/02/26/c99-revisited/
http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html
( = search page on Google then view cached page )

What is the strict aliasing rule?
C99 strict aliasing rules in C++ (GCC)

In the first link, draft minutes of an ISO meeting seven months ago, one participant notes in section 4.16:

Is there anybody that thinks the rules are clear enough? No one is really able to interpret them.

Other notes: My test was with gcc 4.3.4, with -O2; options -O2 and -O3 imply -fstrict-aliasing. The example from the GCC Manual assumes sizeof(double) >= sizeof(int); it doesn't matter if they are unequal.

Also, as noted by Mike Acton in the cellperformace link, -Wstrict-aliasing=2, but not =3, produces warning: dereferencing type-punned pointer might break strict-aliasing rules for the example here.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

北座城市 2024-09-10 02:28:53

GCC 对工会发出警告的事实并不一定意味着工会目前不起作用。但这里有一个比你的例子稍微简单一点的例子:

#include <stdio.h>

struct B {
    int i1;
    int i2;
};

union A {
    struct B b;
    double d;
};

int main() {
    double d = 3.0;
    #ifdef USE_UNION
        ((union A*)&d)->b.i2 += 0x80000000;
    #else
        ((int*)&d)[1] += 0x80000000;
    #endif
    printf("%g\n", d);
}

输出:

$ gcc --version
gcc (GCC) 4.3.4 20090804 (release) 1
Copyright (C) 2008 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc -oalias alias.c -O1 -std=c99 && ./alias
-3

$ gcc -oalias alias.c -O3 -std=c99 && ./alias
3

$ gcc -oalias alias.c -O1 -std=c99 -DUSE_UNION && ./alias
-3

$ gcc -oalias alias.c -O3 -std=c99 -DUSE_UNION && ./alias
-3

所以在 GCC 4.3.4 上,联合“拯救了世界”(假设我想要输出“-3”)。它禁用依赖于严格别名的优化,并在第二种情况下(仅)导致输出“3”。使用 -Wall,USE_UNION 还会禁用类型双关警告。

我没有 gcc 4.4 来测试,但请尝试一下这段代码。您的代码实际上测试了 d 的内存在读回联合之前是否已初始化:我的代码测试它是否被修改。

顺便说一句,将 double 的一半读取为 int 的安全方法是:

double d = 3;
int i;
memcpy(&i, &d, sizeof i);
return i;

通过 GCC 优化,这会导致:

    int thing() {
401130:       55                      push   %ebp
401131:       89 e5                   mov    %esp,%ebp
401133:       83 ec 10                sub    $0x10,%esp
        double d = 3;
401136:       d9 05 a8 20 40 00       flds   0x4020a8
40113c:       dd 5d f0                fstpl  -0x10(%ebp)
        int i;
        memcpy(&i, &d, sizeof i);
40113f:       8b 45 f0                mov    -0x10(%ebp),%eax
        return i;
    }
401142:       c9                      leave
401143:       c3                      ret

因此没有实际调用 memcpy。如果你不这样做,那么如果工会演员停止在海湾合作委员会工作,你就应该得到你所得到的;-)

The fact that GCC is warning about unions doesn't necessarily mean that unions don't currently work. But here's a slightly less simple example than yours:

#include <stdio.h>

struct B {
    int i1;
    int i2;
};

union A {
    struct B b;
    double d;
};

int main() {
    double d = 3.0;
    #ifdef USE_UNION
        ((union A*)&d)->b.i2 += 0x80000000;
    #else
        ((int*)&d)[1] += 0x80000000;
    #endif
    printf("%g\n", d);
}

Output:

$ gcc --version
gcc (GCC) 4.3.4 20090804 (release) 1
Copyright (C) 2008 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc -oalias alias.c -O1 -std=c99 && ./alias
-3

$ gcc -oalias alias.c -O3 -std=c99 && ./alias
3

$ gcc -oalias alias.c -O1 -std=c99 -DUSE_UNION && ./alias
-3

$ gcc -oalias alias.c -O3 -std=c99 -DUSE_UNION && ./alias
-3

So on GCC 4.3.4, the union "saves the day" (assuming I want the output "-3"). It disables the optimisation that relies on strict aliasing and that results in the output "3" in the second case (only). With -Wall, USE_UNION also disables the type-pun warning.

I don't have gcc 4.4 to test, but please give this code a go. Your code in effect tests whether the memory for d is initialised before reading back through a union: mine tests whether it is modified.

Btw, the safe way to read half of a double as an int is:

double d = 3;
int i;
memcpy(&i, &d, sizeof i);
return i;

With optimisation on GCC, this results in:

    int thing() {
401130:       55                      push   %ebp
401131:       89 e5                   mov    %esp,%ebp
401133:       83 ec 10                sub    $0x10,%esp
        double d = 3;
401136:       d9 05 a8 20 40 00       flds   0x4020a8
40113c:       dd 5d f0                fstpl  -0x10(%ebp)
        int i;
        memcpy(&i, &d, sizeof i);
40113f:       8b 45 f0                mov    -0x10(%ebp),%eax
        return i;
    }
401142:       c9                      leave
401143:       c3                      ret

So there's no actual call to memcpy. If you aren't doing this, you deserve what you get if union casts stop working in GCC ;-)

天邊彩虹 2024-09-10 02:28:53

好吧,这有点死人贴,但这是一个恐怖故事。我正在移植一个假设本机字节顺序是大端字节序而编写的程序。现在我也需要它在小端上工作。不幸的是,我不能在任何地方都使用本机字节顺序,因为可以通过多种方式访问​​数据。例如,一个 64 位整数可以被视为两个 32 位整数或 4 个 16 位整数,甚至被视为 16 个 4 位整数。更糟糕的是,无法弄清楚内存中到底存储了什么,因为软件是某种字节码的解释器,而数据是由该字节码形成的。例如,字节代码可能包含写入 16 位整数数组的指令,然后将其中的一对作为 32 位浮点数进行访问。并且无法预测它或更改字节码。

因此,我必须创建一组包装类来处理以大端顺序存储的值,而不管本机端序如何。在 Visual Studio 和 Linux 上的 GCC 中完美运行,无需优化。但有了 gcc -O2,一切就乱了套。经过大量调试后,我发现原因就在这里:

double D;
float F; 
Ul *pF=(Ul*)&F; // Ul is unsigned long
*pF=pop0->lu.r(); // r() returns Ul
D=(double)F; 

此代码用于将存储在 32 位整数中的浮点数的 32 位表示转换为双精度。看来编译器决定在对 D 赋值之后对 *pF 进行赋值 - 结果是第一次执行代码时,D 的值是垃圾,并且后续值“晚”了 1 次迭代。

奇迹般的是,当时没有其他问题。因此,我决定继续在原始平台 HP-UX 上测试我的新代码,该平台是在具有本机大端顺序的 RISC 处理器上。现在它又坏了,这次是在我的新类中:

typedef unsigned long long Ur; // 64-bit uint
typedef unsigned char Uc;
class BEDoubleRef {
        double *p;
public:
        inline BEDoubleRef(double *p): p(p) {}
        inline operator double() {
                Uc *pu = reinterpret_cast<Uc*>(p);
                Ur n = (pu[7] & 0xFFULL) | ((pu[6] & 0xFFULL) << 8)
                        | ((pu[5] & 0xFFULL) << 16) | ((pu[4] & 0xFFULL) << 24)
                        | ((pu[3] & 0xFFULL) << 32) | ((pu[2] & 0xFFULL) << 40)
                        | ((pu[1] & 0xFFULL) << 48) | ((pu[0] & 0xFFULL) << 56);
                return *reinterpret_cast<double*>(&n);
        }
        inline BEDoubleRef &operator=(const double &d) {
                Uc *pc = reinterpret_cast<Uc*>(p);
                const Ur *pu = reinterpret_cast<const Ur*>(&d);
                pc[0] = (*pu >> 56) & 0xFFu;
                pc[1] = (*pu >> 48) & 0xFFu;
                pc[2] = (*pu >> 40) & 0xFFu;
                pc[3] = (*pu >> 32) & 0xFFu;
                pc[4] = (*pu >> 24) & 0xFFu;
                pc[5] = (*pu >> 16) & 0xFFu;
                pc[6] = (*pu >> 8) & 0xFFu;
                pc[7] = *pu & 0xFFu;
                return *this;
        }
        inline BEDoubleRef &operator=(const BEDoubleRef &d) {
                *p = *d.p;
                return *this;
        }
};

由于一些非常奇怪的原因,第一个赋值运算符只正确分配了字节 1 到 7。字节 0 总是有一些无意义的东西,这破坏了一切,因为有一个符号位和一个订单的一部分。

我尝试使用工会作为解决方法:

union {
    double d;
    Uc c[8];
} un;
Uc *pc = un.c;
const Ur *pu = reinterpret_cast<const Ur*>(&d);
pc[0] = (*pu >> 56) & 0xFFu;
pc[1] = (*pu >> 48) & 0xFFu;
pc[2] = (*pu >> 40) & 0xFFu;
pc[3] = (*pu >> 32) & 0xFFu;
pc[4] = (*pu >> 24) & 0xFFu;
pc[5] = (*pu >> 16) & 0xFFu;
pc[6] = (*pu >> 8) & 0xFFu;
pc[7] = *pu & 0xFFu;
*p = un.d;

但它也不起作用。事实上,它更好一点——它只在负数时失败。

此时,我正在考虑添加一个简单的本机字节序测试,然后通过 char* 指针和 if (LITTLE_ENDIAN) 检查来完成所有操作。更糟糕的是,该程序大量使用了联合体,目前看来工作正常,但在经历了这些混乱之后,如果它突然无缘无故地崩溃,我不会感到惊讶。

Well it's a bit of necro-posting, but here is a horror story. I'm porting a program that was written with the assumption that the native byte order is big endian. Now I need it to work on little endian too. Unfortunately, I can't just use native byte order everywhere, as data could be accessed in many ways. For example, a 64-bit integer could be treated as two 32-bit integers or as 4 16-bit integers, or even as 16 4-bit integers. To make things worse, there is no way to figure out what exactly is stored in memory, because the software is an interpreter for some sort of byte code, and the data is formed by that byte code. For example, the byte code may contain instructions to write an array of 16-bit integers, and then access a pair of them as a 32-bit float. And there is no way to predict it or alter the byte code.

Therefore, I had to create a set of wrapper classes to work with values stored in the big endian order regardless of the native endianness. Worked perfectly in Visual Studio and in GCC on Linux with no optimizations. But with gcc -O2, hell broke loose. After a lot of debugging I figured out that the reason was here:

double D;
float F; 
Ul *pF=(Ul*)&F; // Ul is unsigned long
*pF=pop0->lu.r(); // r() returns Ul
D=(double)F; 

This code was used to convert a 32-bit representation of a float stored in a 32-bit integer to double. It seems that the compiler decided to do the assignment to *pF after the assignment to D - the result was that the first time the code was executed, the value of D was garbage, and the consequent values were "late" by 1 iteration.

Miraculously, there were no other problems at that point. So I decided to move on and test my new code on the original platform, HP-UX on a RISC processor with native big endian order. Now it broke again, this time in my new class:

typedef unsigned long long Ur; // 64-bit uint
typedef unsigned char Uc;
class BEDoubleRef {
        double *p;
public:
        inline BEDoubleRef(double *p): p(p) {}
        inline operator double() {
                Uc *pu = reinterpret_cast<Uc*>(p);
                Ur n = (pu[7] & 0xFFULL) | ((pu[6] & 0xFFULL) << 8)
                        | ((pu[5] & 0xFFULL) << 16) | ((pu[4] & 0xFFULL) << 24)
                        | ((pu[3] & 0xFFULL) << 32) | ((pu[2] & 0xFFULL) << 40)
                        | ((pu[1] & 0xFFULL) << 48) | ((pu[0] & 0xFFULL) << 56);
                return *reinterpret_cast<double*>(&n);
        }
        inline BEDoubleRef &operator=(const double &d) {
                Uc *pc = reinterpret_cast<Uc*>(p);
                const Ur *pu = reinterpret_cast<const Ur*>(&d);
                pc[0] = (*pu >> 56) & 0xFFu;
                pc[1] = (*pu >> 48) & 0xFFu;
                pc[2] = (*pu >> 40) & 0xFFu;
                pc[3] = (*pu >> 32) & 0xFFu;
                pc[4] = (*pu >> 24) & 0xFFu;
                pc[5] = (*pu >> 16) & 0xFFu;
                pc[6] = (*pu >> 8) & 0xFFu;
                pc[7] = *pu & 0xFFu;
                return *this;
        }
        inline BEDoubleRef &operator=(const BEDoubleRef &d) {
                *p = *d.p;
                return *this;
        }
};

For some really weird reason, the first assignment operator only correctly assigned bytes 1 through 7. Byte 0 always had some nonsense in it, which broke everything as there is a sign bit and a part of order.

I have tried to use unions as a workaround:

union {
    double d;
    Uc c[8];
} un;
Uc *pc = un.c;
const Ur *pu = reinterpret_cast<const Ur*>(&d);
pc[0] = (*pu >> 56) & 0xFFu;
pc[1] = (*pu >> 48) & 0xFFu;
pc[2] = (*pu >> 40) & 0xFFu;
pc[3] = (*pu >> 32) & 0xFFu;
pc[4] = (*pu >> 24) & 0xFFu;
pc[5] = (*pu >> 16) & 0xFFu;
pc[6] = (*pu >> 8) & 0xFFu;
pc[7] = *pu & 0xFFu;
*p = un.d;

but it didn't work either. In fact, it was a bit better - it only failed for negative numbers.

At this point I'm thinking about adding a simple test for native endianness, then doing everything via char* pointers with if (LITTLE_ENDIAN) checks around. To make things worse, the program makes heavy use of unions all around, which seems to work ok for now, but after all this mess I won't be surprised if it suddenly breaks for no apparent reason.

把人绕傻吧 2024-09-10 02:28:53

您认为以下代码是“错误”的断言:

extern void foo(int *, double *);
union a_union t;
t.d = 3.0;
foo(&t.i, &t.d); // undefined behavior

...是错误的。仅获取两个联合成员的地址并将它们传递给外部函数不会导致未定义的行为;您只能通过以无效方式取消引用其中一个指针来获得这一点。例如,如果函数 foo 立即返回而不取消引用您传递给它的指针,则该行为不是未定义的。通过严格阅读 C99 标准,甚至在某些情况下可以取消引用指针而不调用未定义的行为;例如,它可以读取第二个指针引用的值,然后通过第一个指针存储一个值,只要它们都指向动态分配的对象(即没有“声明类型”的对象)。

Your assertion that the following code is "wrong":

extern void foo(int *, double *);
union a_union t;
t.d = 3.0;
foo(&t.i, &t.d); // undefined behavior

... is wrong. Just taking the address of the two union members and passing them to an external function doesn't result in undefined behaviour; you only get that from dereferencing one of those pointers in an invalid way. For instance if the function foo returns immediately without dereferencing the pointers you passed it, then the behaviour is not undefined. With a strict reading of the C99 standard, there are even some cases where the pointers can be dereferenced without invoking undefined behaviour; for instance, it could read the value referenced by the second pointer, and then store a value through the first pointer, as long as they both point to a dynamically allocated object (i.e. one without a "declared type").

风流物 2024-09-10 02:28:53

当编译器有两个不同的指针指向同一块内存时,就会发生别名。通过对指针进行类型转换,您将生成一个新的临时指针。例如,如果优化器重新排序汇编指令,则访问两个指针可能会给出两个完全不同的结果 - 它可能会在写入同一地址之前重新排序读取。这就是为什么它是未定义的行为。

您不太可能在非常简单的测试代码中看到这个问题,但是当发生很多事情时它就会出现。

我认为这个警告是为了明确工会并不是一个特例,尽管你可能认为它们是特例。

有关别名的更多信息,请参阅此维基百科文章:http://en.wikipedia.org /wiki/Aliasing_(计算)#Conflicts_with_optimization

Aliasing occurs when the compiler has two different pointers to the same piece of memory. By typecasting a pointer, you're generating a new temporary pointer. If the optimizer reorders the assembly instructions for example, accessing the two pointers might give two totally different results - it might reorder a read before a write to the same address. This is why it is undefined behavior.

You are unlikely to see the problem in very simple test code, but it will appear when there's a lot going on.

I think the warning is to make clear that unions are not a special case, even though you might expect them to be.

See this Wikipedia article for more information about aliasing: http://en.wikipedia.org/wiki/Aliasing_(computing)#Conflicts_with_optimization

叹梦 2024-09-10 02:28:53

你见过这个吗?
什么是严格别名规则?

该链接包含指向本文的辅助链接与海湾合作委员会的例子。 http://cellperformance.beyond3d.com/articles/2006/ 06/understanding-strict-aliasing.html

尝试这样的联合会更接近问题。

union a_union {
    int i;
    double *d;
};

这样你就有了两种类型,一个 int 和一个 double* 指向相同的内存。在这种情况下,使用 double (*(double*)&i) 可能会导致问题。

Have you seen this ?
What is the strict aliasing rule?

The link contains a secondary link to this article with gcc examples. http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html

Trying a union like this would be closer to the problem.

union a_union {
    int i;
    double *d;
};

That way you have 2 types, an int and a double* pointing to the same memory. In this case using the double (*(double*)&i) could cause the problem.

许久 2024-09-10 02:28:53

这是我的:认为这是所有 GCC v5.x 及更高版本中的错误

#include <iostream>
#include <complex>
#include <pmmintrin.h>

template <class Scalar_type, class Vector_type>
class simd {
 public:
  typedef Vector_type vector_type;
  typedef Scalar_type scalar_type;
  typedef union conv_t_union {
    Vector_type v;
    Scalar_type s[sizeof(Vector_type) / sizeof(Scalar_type)];
    conv_t_union(){};
  } conv_t;

  static inline constexpr int Nsimd(void) {
    return sizeof(Vector_type) / sizeof(Scalar_type);
  }

  Vector_type v;

  template <class functor>
  friend inline simd SimdApply(const functor &func, const simd &v) {
    simd ret;
    simd::conv_t conv;

    conv.v = v.v;
    for (int i = 0; i < simd::Nsimd(); i++) {
      conv.s[i] = func(conv.s[i]);
    }
    ret.v = conv.v;
    return ret;
  }

};

template <class scalar>
struct RealFunctor {
  scalar operator()(const scalar &a) const {
    return std::real(a);
  }
};

template <class S, class V>
inline simd<S, V> real(const simd<S, V> &r) {
  return SimdApply(RealFunctor<S>(), r);
}



typedef simd<std::complex<double>, __m128d> vcomplexd;

int main(int argc, char **argv)
{
  vcomplexd a,b;
  a.v=_mm_set_pd(2.0,1.0);
  b = real(a);

  vcomplexd::conv_t conv;
  conv.v = b.v;
  for(int i=0;i<vcomplexd::Nsimd();i++){
    std::cout << conv.s[i]<<" ";
  }
  std::cout << std::endl;
}

应该给出

c010200:~ peterboyle$ g++-mp-5 Gcc-test.cc -std=c++11 
c010200:~ peterboyle$ ./a.out 
(1,0) 

但是在 -O3 下:我认为这是错误的并且是编译器错误

c010200:~ peterboyle$ g++-mp-5 Gcc-test.cc -std=c++11 -O3 
c010200:~ peterboyle$ ./a.out 
(0,0) 

在 g++4.9 下

c010200:~ peterboyle$ g++-4.9 Gcc-test.cc -std=c++11 -O3 
c010200:~ peterboyle$ ./a.out 
(1,0) 

在 llvm xcode 下

c010200:~ peterboyle$ g++ Gcc-test.cc -std=c++11 -O3 
c010200:~ peterboyle$ ./a.out 
(1,0) 

Here is mine: In think this is a bug in all GCC v5.x and later

#include <iostream>
#include <complex>
#include <pmmintrin.h>

template <class Scalar_type, class Vector_type>
class simd {
 public:
  typedef Vector_type vector_type;
  typedef Scalar_type scalar_type;
  typedef union conv_t_union {
    Vector_type v;
    Scalar_type s[sizeof(Vector_type) / sizeof(Scalar_type)];
    conv_t_union(){};
  } conv_t;

  static inline constexpr int Nsimd(void) {
    return sizeof(Vector_type) / sizeof(Scalar_type);
  }

  Vector_type v;

  template <class functor>
  friend inline simd SimdApply(const functor &func, const simd &v) {
    simd ret;
    simd::conv_t conv;

    conv.v = v.v;
    for (int i = 0; i < simd::Nsimd(); i++) {
      conv.s[i] = func(conv.s[i]);
    }
    ret.v = conv.v;
    return ret;
  }

};

template <class scalar>
struct RealFunctor {
  scalar operator()(const scalar &a) const {
    return std::real(a);
  }
};

template <class S, class V>
inline simd<S, V> real(const simd<S, V> &r) {
  return SimdApply(RealFunctor<S>(), r);
}



typedef simd<std::complex<double>, __m128d> vcomplexd;

int main(int argc, char **argv)
{
  vcomplexd a,b;
  a.v=_mm_set_pd(2.0,1.0);
  b = real(a);

  vcomplexd::conv_t conv;
  conv.v = b.v;
  for(int i=0;i<vcomplexd::Nsimd();i++){
    std::cout << conv.s[i]<<" ";
  }
  std::cout << std::endl;
}

Should give

c010200:~ peterboyle$ g++-mp-5 Gcc-test.cc -std=c++11 
c010200:~ peterboyle$ ./a.out 
(1,0) 

But under -O3: I THINK THIS IS WRONG AND A COMPILER ERROR

c010200:~ peterboyle$ g++-mp-5 Gcc-test.cc -std=c++11 -O3 
c010200:~ peterboyle$ ./a.out 
(0,0) 

Under g++4.9

c010200:~ peterboyle$ g++-4.9 Gcc-test.cc -std=c++11 -O3 
c010200:~ peterboyle$ ./a.out 
(1,0) 

Under llvm xcode

c010200:~ peterboyle$ g++ Gcc-test.cc -std=c++11 -O3 
c010200:~ peterboyle$ ./a.out 
(1,0) 
冷弦 2024-09-10 02:28:53

我不太明白你的问题。编译器完全按照您的示例执行的操作。 union 转换就是您在 f1 中所做的。在f2中,它是一个普通的指针类型转换,您将其转换为联合是无关紧要的,它仍然是一个指针转换

I don't really understand your problem. The compiler did exactly what it was supposed to do in your example. The union conversion is what you did in f1. In f2 it's a normal pointer typecast, that you casted it to a union is irrelevant, it's still a pointer casting

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文