为什么这个循环没有矢量化?

发布于 2024-12-06 01:06:47 字数 866 浏览 2 评论 0 原文

当我分析正在处理的代码时,一个特别的热点是以下循环:

for(int loc = start; loc<end; ++loc)
    y[loc]+=a[offset+loc]*x[loc+d];

其中数组 y、a 和 x 没有重叠。在我看来,这样的循环应该很容易矢量化,但是当我使用带有选项“-O3 -ftree-vectorize -ftree-vectorizer-verbose=1”的 g++ 进行编译时,我没有任何迹象表明这个特定的循环已矢量化。但是,在上面的代码之前发生的循环:

for(int i=0; i<m; ++i)
    y[i]=0;

确实根据输出进行矢量化。关于为什么第一个循环没有矢量化,或者我如何解决这个问题有什么想法吗? (我并没有受过矢量化概念方面的教育,所以我可能错过了一些非常明显的东西)

根据 Oli 的建议,调高详细程度会产生以下注释(虽然我通常擅长阅读编译器警告/错误/输出,我不知道这意味着什么):

./include/mv_ops.h:89: note: dependence distance  = 0.
./include/mv_ops.h:89: note: accesses have the same alignment.
./include/mv_ops.h:89: note: dependence distance modulo vf == 0 between *D.50620_89 and *D.50620_89
./include/mv_ops.h:89: note: not vectorized: can't determine dependence between *D.50623_98 and *D.50620_89

One particular hot spot when I profile a code I am working on, is the following loop:

for(int loc = start; loc<end; ++loc)
    y[loc]+=a[offset+loc]*x[loc+d];

where the arrays y, a, and x have no overlap. It seems to me that a loop like this should be easily vectorized, however when I compile using g++ with the options "-O3 -ftree-vectorize -ftree-vectorizer-verbose=1", I get no indication that this particular loop was vectorized. However, a loop occurring just before the code above:

for(int i=0; i<m; ++i)
    y[i]=0;

does get vectorized according to the output. Any thoughts on why the first loop is not vectorized, or how I might be able to fix this? (I am not all that educated on the concept of vectorization, so I am likely missing something quite obvious)

As per Oli's suggestion, turning up the verbosity yields the following notes (while I am usually good at reading compiler warnings/errors/output, I have no idea what this means):

./include/mv_ops.h:89: note: dependence distance  = 0.
./include/mv_ops.h:89: note: accesses have the same alignment.
./include/mv_ops.h:89: note: dependence distance modulo vf == 0 between *D.50620_89 and *D.50620_89
./include/mv_ops.h:89: note: not vectorized: can't determine dependence between *D.50623_98 and *D.50620_89

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

梦中楼上月下 2024-12-13 01:06:48

您需要告诉编译器 xya 不重叠。在 C/C++ 术语中,这意味着通过使用 限制(或__restrict。当 gcc 假设没有别名时,它对优化非常积极,所以要小心。

You need to tell the compiler that x, y, and a do not overlap. In C/C++ terms that means telling the compiler that those pointers do not alias by declaring them with restrict (or __restrict). gcc is very aggressive about optimizations when it assumes no aliasing, so be careful.

肥爪爪 2024-12-13 01:06:48

一种可能性是编译器无法保证不存在别名。换句话说,编译器如何确保 yax 不会以某种方式重叠?

如果您提高详细程度,您可能会获得一些额外的信息。

One possibility is that the compiler can't guarantee that there are no aliases. In other words, how can the compiler be sure that y, a and x don't overlap in some way?

If you turn the verbosity level up, you may get some extra info.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文