使用 SSE 优化有限差分
我想知道是否可以使用 SSE (1,2,3,4,...) 来优化以下循环:
// u and v are allocated through new double[size*size]
for (int j = l; j < size-1; ++j)
{
for (int k = 1; k < size-1; ++k)
{
v[j*size + k] = (u[j*size + k-1] + u[j*size + k+1]
+ u[(j-1)*size + k]+ u[(j+1)*size + k]) / 4.0;
}
}
[j*size + k]
习惯用法用于处理内存块就好像它是一个多维数组。
遗憾的是,GCC (4.5) 的 -ftree-vectorize
标志不认为循环适合 SIMD 类型优化。 (尽管我说我从未见过 -ftree-vectorize 除了最琐碎的循环之外还优化过任何东西。)
虽然我知道还有许多其他方法可以提高循环的性能(OpenMP,展开、就地算法等)我特别想知道是否可以使用 SIMD。我可能对如何(如果有的话)这样的循环进行转换的总体轮廓更感兴趣,而不是具体的实现。
I am wondering if it is possible to use SSE (1,2,3,4,...) to optimise the following loop:
// u and v are allocated through new double[size*size]
for (int j = l; j < size-1; ++j)
{
for (int k = 1; k < size-1; ++k)
{
v[j*size + k] = (u[j*size + k-1] + u[j*size + k+1]
+ u[(j-1)*size + k]+ u[(j+1)*size + k]) / 4.0;
}
}
The [j*size + k]
idiom is used to treat the block of memory as if it were a multi-dimensional array.
Sadly the -ftree-vectorize
flag for GCC (4.5) does not believe that the loop is amenable to SIMD-type optimisation. (Although saying that I've never seen -ftree-vectorize
optimise anything but the most trivial of loops.)
While I am aware that there are many other ways to improve the performance of the loop (OpenMP, unrolling, in-place algorithms, etc) I am specifically interested to know if SIMD can be used. I am perhaps more interested in the general outline of how (if at all) such a loop could be transformed, as opposed to a concrete implementation.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
看起来应该是可能的,但由于 (a) 你使用的是双精度,(b) 你只进行了很少的 I/O 计算,(c) 大多数现代 x86-64 CPU 无论如何都有两个 FPU,那么您的 SIMD 编码投资可能不会获得太多回报。
It looks like it should be possible, but since (a) you're using doubles, (b) you're doing very little computation relative to I/O, (c) most modern x86-64 CPUs have two FPUs anyway, then you may not get much return on your SIMD coding investment.