在同一循环上使用多个Pragma在GCC和ICC上进行自动矢量化
当在简单的数组上运行一个简单的循环时,
for(int i=0;i<16;i++)
{
a[i]=b[i]+c[i];
}
GCC和ICC的行为与布拉格斯有所不同。因此,我尝试了布拉格马斯(Pragmas),并观察到ICC受益于此:
#pragma vector always vectorlength(16)
for(int i=0;i<16;i++)
{
a[i]=b[i]+c[i];
}
GCC受益于此:
#pragma gcc ivdep
for(int i=0;i<16;i++)
{
a[i]=b[i]+c[i];
}
支持这两个编译器的正确方法是什么?这样的东西:
#pragma vector always vectorlength(16)
#pragma gcc ivdep
for(int i=0;i<16;i++)
{
a[i]=b[i]+c[i];
}
或使用定义宏? (我不喜欢宏,但如果没有其他选项,我可以使用)
我试图支持#pragma op simd Safelen(16)
对于没有OpenMP的平台。我发现的最接近的老将是GCC IVDEP和矢量,但仍然不如OMP的Pragma快。可能我错过了更多的布拉格马斯。
- A,B和C是同一堆栈中的简单数组,它们与64对齐。
- 该函数具有
__属性__(((始终_inline)))
可帮助ICC进行4X性能(但仍比GCC慢50%) - ICC标志:
-STD = C ++ 14 -XCORE -AVX512 -QOPT -ZMM -ISAGE = HIGH -O3 -LGOMP -LGOMP -FMATH -ERRNO -MMPREFER -VECTOR -WIDTH = 512 -FTREE -vectorize -lpthread -lpthread
- GCC标志:
-STD = C ++ 14 -March = Cascadelake -fmath -Errno -Mavx512f -o3 -lgomp -lgomp -mprefer -vector -vector -width = 512 -ftree -vectorize -ftree -vectorize -lpthread
没有#pragma vector始终
与GCC相当吗?
When there is a simple loop running on simple arrays,
for(int i=0;i<16;i++)
{
a[i]=b[i]+c[i];
}
GCC and ICC behave differently with pragmas. So I experimented with pragmas and observed that ICC benefits from this:
#pragma vector always vectorlength(16)
for(int i=0;i<16;i++)
{
a[i]=b[i]+c[i];
}
and GCC benefits from this:
#pragma gcc ivdep
for(int i=0;i<16;i++)
{
a[i]=b[i]+c[i];
}
What is the right approach to support both compilers? Something like this:
#pragma vector always vectorlength(16)
#pragma gcc ivdep
for(int i=0;i<16;i++)
{
a[i]=b[i]+c[i];
}
or using define macros? (I'm not fond of macros but can use if no other option is left)
I'm trying to support #pragma omp simd safelen(16)
for platforms that do not have OpenMP. Closest pragmas I found are gcc ivdep and vector always but still they are not as fast as omp's pragma. Probably I'm missing some more pragmas.
- a,b and c are simple arrays in same stack and they are aligned to 64.
- the function has
__attribute__((always_inline))
which helps ICC for 4x performance (but still slower than GCC by 50%) - ICC flags:
-std=c++14 -xCORE-AVX512 -qopt-zmm-usage=high -O3 -lgomp -fmath-errno -mprefer-vector-width=512 -ftree-vectorize -lpthread
- GCC flags:
-std=c++14 -march=cascadelake -fmath-errno -mavx512f -O3 -lgomp -mprefer-vector-width=512 -ftree-vectorize -lpthread
Lastly, why is there no #pragma vector always
equivalent for GCC?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论