当我分析正在处理的代码时,一个特别的热点是以下循环:
for(int loc = start; loc<end; ++loc)
y[loc]+=a[offset+loc]*x[loc+d];
其中数组 y、a 和 x 没有重叠。在我看来,这样的循环应该很容易矢量化,但是当我使用带有选项“-O3 -ftree-vectorize -ftree-vectorizer-verbose=1”的 g++ 进行编译时,我没有任何迹象表明这个特定的循环已矢量化。但是,在上面的代码之前发生的循环:
for(int i=0; i<m; ++i)
y[i]=0;
确实根据输出进行矢量化。关于为什么第一个循环没有矢量化,或者我如何解决这个问题有什么想法吗? (我并没有受过矢量化概念方面的教育,所以我可能错过了一些非常明显的东西)
根据 Oli 的建议,调高详细程度会产生以下注释(虽然我通常擅长阅读编译器警告/错误/输出,我不知道这意味着什么):
./include/mv_ops.h:89: note: dependence distance = 0.
./include/mv_ops.h:89: note: accesses have the same alignment.
./include/mv_ops.h:89: note: dependence distance modulo vf == 0 between *D.50620_89 and *D.50620_89
./include/mv_ops.h:89: note: not vectorized: can't determine dependence between *D.50623_98 and *D.50620_89
One particular hot spot when I profile a code I am working on, is the following loop:
for(int loc = start; loc<end; ++loc)
y[loc]+=a[offset+loc]*x[loc+d];
where the arrays y, a, and x have no overlap. It seems to me that a loop like this should be easily vectorized, however when I compile using g++ with the options "-O3 -ftree-vectorize -ftree-vectorizer-verbose=1", I get no indication that this particular loop was vectorized. However, a loop occurring just before the code above:
for(int i=0; i<m; ++i)
y[i]=0;
does get vectorized according to the output. Any thoughts on why the first loop is not vectorized, or how I might be able to fix this? (I am not all that educated on the concept of vectorization, so I am likely missing something quite obvious)
As per Oli's suggestion, turning up the verbosity yields the following notes (while I am usually good at reading compiler warnings/errors/output, I have no idea what this means):
./include/mv_ops.h:89: note: dependence distance = 0.
./include/mv_ops.h:89: note: accesses have the same alignment.
./include/mv_ops.h:89: note: dependence distance modulo vf == 0 between *D.50620_89 and *D.50620_89
./include/mv_ops.h:89: note: not vectorized: can't determine dependence between *D.50623_98 and *D.50620_89
发布评论
评论(2)
您需要告诉编译器
x
、y
和a
不重叠。在 C/C++ 术语中,这意味着通过使用限制
(或__restrict
)。当 gcc 假设没有别名时,它对优化非常积极,所以要小心。You need to tell the compiler that
x
,y
, anda
do not overlap. In C/C++ terms that means telling the compiler that those pointers do not alias by declaring them withrestrict
(or__restrict
). gcc is very aggressive about optimizations when it assumes no aliasing, so be careful.一种可能性是编译器无法保证不存在别名。换句话说,编译器如何确保
y
、a
和x
不会以某种方式重叠?如果您提高详细程度,您可能会获得一些额外的信息。
One possibility is that the compiler can't guarantee that there are no aliases. In other words, how can the compiler be sure that
y
,a
andx
don't overlap in some way?If you turn the verbosity level up, you may get some extra info.