使用 SSE 优化有限差分

发布于 2024-10-03 02:32:40 字数 638 浏览 7 评论 0原文

我想知道是否可以使用 SSE (1,2,3,4,...) 来优化以下循环：

// u and v are allocated through new double[size*size]
for (int j = l; j < size-1; ++j)
{
    for (int k = 1; k < size-1; ++k)
    {
        v[j*size + k] = (u[j*size + k-1] + u[j*size + k+1] 
                       + u[(j-1)*size + k]+ u[(j+1)*size + k]) / 4.0;
    }
}

[j*size + k] 习惯用法用于处理内存块就好像它是一个多维数组。

遗憾的是，GCC (4.5) 的 -ftree-vectorize 标志不认为循环适合 SIMD 类型优化。（尽管我说我从未见过 -ftree-vectorize 除了最琐碎的循环之外还优化过任何东西。）

虽然我知道还有许多其他方法可以提高循环的性能（OpenMP，展开、就地算法等）我特别想知道是否可以使用 SIMD。我可能对如何（如果有的话）这样的循环进行转换的总体轮廓更感兴趣，而不是具体的实现。

原文

I am wondering if it is possible to use SSE (1,2,3,4,...) to optimise the following loop:

// u and v are allocated through new double[size*size]
for (int j = l; j < size-1; ++j)
{
    for (int k = 1; k < size-1; ++k)
    {
        v[j*size + k] = (u[j*size + k-1] + u[j*size + k+1] 
                       + u[(j-1)*size + k]+ u[(j+1)*size + k]) / 4.0;
    }
}

The [j*size + k] idiom is used to treat the block of memory as if it were a multi-dimensional array.

Sadly the -ftree-vectorize flag for GCC (4.5) does not believe that the loop is amenable to SIMD-type optimisation. (Although saying that I've never seen -ftree-vectorize optimise anything but the most trivial of loops.)

While I am aware that there are many other ways to improve the performance of the loop (OpenMP, unrolling, in-place algorithms, etc) I am specifically interested to know if SIMD can be used. I am perhaps more interested in the general outline of how (if at all) such a loop could be transformed, as opposed to a concrete implementation.

分享到QQ

分享到微博