为什么 GCC 不自动矢量化这个循环?
我正在尝试优化一个占用程序大量计算时间的循环。
但是,当我使用 -O3 -ffast-math -ftree-vectorizer-verbose=6 GCC 输出打开自动矢量化时,它无法对循环进行矢量化。
我正在使用 GCC 4.4.5
代码:
/// Find the point in the path with the largest v parameter
void prediction::find_knife_edge(
const float * __restrict__ const elevation_path,
float * __restrict__ const diff_path,
const float path_res,
const unsigned a,
const unsigned b,
const float h_a,
const float h_b,
const float f,
const float r_e,
) const
{
float wavelength = (speed_of_light * 1e-6f) / f;
float d_ab = path_res * static_cast<float>(b - a);
for (unsigned n = a + 1; n <= b - 1; n++)
{
float d_an = path_res * static_cast<float>(n - a);
float d_nb = path_res * static_cast<float>(b - n);
float h = elevation_path[n] + (d_an * d_nb) / (2.0f * r_e) - (h_a * d_nb + h_b * d_an) / d_ab;
float v = h * std::sqrt((2.0f * d_ab) / (wavelength * d_an * d_nb));
diff_path[n] = v;
}
}
来自 GCC 的消息:
note: not vectorized: number of iterations cannot be computed.
note: not vectorized: unhandled data-ref
在有关自动矢量化的页面上 ( http ://gcc.gnu.org/projects/tree-ssa/vectorization.html )它声明它支持未知循环边界。
如果我用 for 替换
for (unsigned n = 0; n <= 100; n++)
,它就会对其进行矢量化。
我做错了什么?
缺乏关于这些消息的确切含义以及 GCC 自动矢量化的细节的详细文档是相当烦人的。
编辑:
感谢 David,我将循环更改为:
for (unsigned n = a + 1; n < b; n++)
现在 GCC 尝试对循环进行矢量化,但抛出此错误:
note: not vectorized: unhandled data-ref
note: Alignment of access forced using peeling.
note: Vectorizing an unaligned access.
note: vect_model_induction_cost: inside_cost = 1, outside_cost = 2 .
note: not vectorized: relevant stmt not supported: D.76777_65 = (float) n_34;
“D.76777_65 = (float) n_34;”是什么意思?意思是?
I am attempting to optimize a loop that accounts for a lot of my program's computation time.
But when I turn on auto-vectorization with -O3 -ffast-math -ftree-vectorizer-verbose=6 GCC outputs that it can not vectorize the loop.
I am using GCC 4.4.5
The code:
/// Find the point in the path with the largest v parameter
void prediction::find_knife_edge(
const float * __restrict__ const elevation_path,
float * __restrict__ const diff_path,
const float path_res,
const unsigned a,
const unsigned b,
const float h_a,
const float h_b,
const float f,
const float r_e,
) const
{
float wavelength = (speed_of_light * 1e-6f) / f;
float d_ab = path_res * static_cast<float>(b - a);
for (unsigned n = a + 1; n <= b - 1; n++)
{
float d_an = path_res * static_cast<float>(n - a);
float d_nb = path_res * static_cast<float>(b - n);
float h = elevation_path[n] + (d_an * d_nb) / (2.0f * r_e) - (h_a * d_nb + h_b * d_an) / d_ab;
float v = h * std::sqrt((2.0f * d_ab) / (wavelength * d_an * d_nb));
diff_path[n] = v;
}
}
The messages from GCC:
note: not vectorized: number of iterations cannot be computed.
note: not vectorized: unhandled data-ref
On the page about auto-vectorization ( http://gcc.gnu.org/projects/tree-ssa/vectorization.html ) it states that it supports unknown loop bounds.
If I replace the for with
for (unsigned n = 0; n <= 100; n++)
then it vectorizes it.
What am I doing wrong?
The lack of detailed documentation on exactly what these messages mean and the ins/outs of GCC auto-vectorization is rather annoying.
EDIT:
Thanks to David I changed the loop to this:
for (unsigned n = a + 1; n < b; n++)
Now GCC attempts to vectorize the loop but throws out this error:
note: not vectorized: unhandled data-ref
note: Alignment of access forced using peeling.
note: Vectorizing an unaligned access.
note: vect_model_induction_cost: inside_cost = 1, outside_cost = 2 .
note: not vectorized: relevant stmt not supported: D.76777_65 = (float) n_34;
What does "D.76777_65 = (float) n_34;" mean?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我可能稍微搞砸了细节,但这是您需要重构循环以使其矢量化的方式。诀窍是预先计算迭代次数并从 0 迭代到该数字的小 1。不要更改
for
语句。您可能需要修复它之前的两行和循环顶部的两行。他们大约是对的。 ;)I may have slightly botched the details, but this is the way you need to restructure your loop to get it to vectorize. The trick is to precompute the number of iterations and iterate from 0 to one short of that number. Do not change the
for
statement. You may need to fix the two lines before it and the two lines at the top of the loop. They're approximately right. ;)