使用 OpenMp 和 SSE 的分段错误
我刚刚开始尝试将 OpenMP 添加到一些 SSE 代码中。
我的第一个测试程序有时会在 _mm_set_ps 中崩溃,但当我设置 if (0) 时可以正常工作。
它看起来很简单,我一定错过了一些明显的东西。 我正在使用 gcc -fopenmp -g -march=core2 -pthreads 进行编译
#include <stdio.h>
#include <stdlib.h>
#include <immintrin.h>
int main()
{
#pragma omp parallel if (1)
{
#pragma omp sections
{
#pragma omp section
{
__m128 x1 = _mm_set_ps ( 1.1f, 2.1f, 3.1f, 4.1f );
}
#pragma omp section
{
__m128 x2 = _mm_set_ps ( 1.2f, 2.2f, 3.2f, 4.2f );
}
} // end omp sections
} //end omp parallel
return 0;
}
I'm just getting started experimenting adding OpenMP to some SSE code.
My first test program SOMETIMES crashes in _mm_set_ps, but works when I set the if (0).
It looks so simple I must be missing something obvious.
I'm compiling with gcc -fopenmp -g -march=core2 -pthreads
#include <stdio.h>
#include <stdlib.h>
#include <immintrin.h>
int main()
{
#pragma omp parallel if (1)
{
#pragma omp sections
{
#pragma omp section
{
__m128 x1 = _mm_set_ps ( 1.1f, 2.1f, 3.1f, 4.1f );
}
#pragma omp section
{
__m128 x2 = _mm_set_ps ( 1.2f, 2.2f, 3.2f, 4.2f );
}
} // end omp sections
} //end omp parallel
return 0;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是 openMP 实现中的一个错误。我在 Windows (MinGW) 上的 gcc 中遇到了同样的问题。
-mstackrealign
命令行选项解决了我的问题。这会向每个函数的 prolog 添加一条指令,以在 16 字节边界重新对齐堆栈。我没有注意到任何性能损失。您还可以尝试添加__attribute__ ((force_align_arg_pointer))
到函数声明,它应该做同样的事情,但仅针对特定函数。您可能必须将 SSE 代码放在一个单独的函数中,然后使用 #pragma omp 从该函数中调用该函数,以便堆栈有机会重新对齐。当我开始编译 64 位目标(MinGW64,例如 TDM GCC 构建)。
我正在使用需要 32 字节对齐的 AVX 指令,但 GCC 根本不支持 Windows。这迫使我使用 python 脚本修复生成的汇编代码,但它有效。
This is a bug in the openMP implementation. I was having the same problem in gcc on Windows (MinGW).
-mstackrealign
command line option solved my problem. This adds an instruction to the prolog of every function to realign the stack at the 16-byte boundary. I didn't notice any performance penalty. You can also try to add__attribute__ ((force_align_arg_pointer))
to a function declaration, which should do the same, but only for a specific function. You might have to put the SSE code in a separate function that you then call from the function with #pragma omp, so that the stack has a chance to be realigned.I stopped having the problem when I moved onto compiling for a 64-bit target (MinGW64, such as TDM GCC build).
I am playing with AVX instructions which require a 32-byte alignment, but GCC doesn't support that for windows at all. This forced me to fix the produced assembly code using a python script, but it works.
我闻到了未对齐的内存访问的味道。这是这样的代码可能爆炸的唯一方式(假设是那里唯一的代码)。为此,不会使用 XMM 寄存器,而是使用堆栈内存,该内存仅与 4 个字节对齐,我的猜测是 omp 代码弄乱了堆栈的对齐。
I smell unaligned memory access. Its the only way code like that could explode(assuming that is the only code there). For that to happen the XMM registers wouldn't be used but rather stack memory, which is only aligned to 4 bytes, my guess is the omp code is messing up the alignment of the stack.