Fortran 矩阵运算的性能

发布于 2024-11-07 22:03:20 字数 367 浏览 0 评论 0原文

我需要在某个地方使用 Fortran 而不是 C,而且我对 Fortran 很陌生。我正在尝试进行一些大型计算,但与 C 相比,它相当慢(可能是 10 倍或更多,而且我都使用英特尔的编译器)。我认为原因是 Fortran 将矩阵保持为列主格式,并且我正在尝试执行诸如 sum(matrix(i, j, :)) 之类的操作,因为它是列主格式,可能这使用缓存的效率非常低(可能不是)完全使用)。然而,我不确定这是否是真正的原因(因为我对 Fortran 知之甚少)。问题是,Fortran 中的约定是对列向量而不是行向量进行运算?

(顺便说一句:我已经使用英特尔的 LAPACK 库检查了 Fortran 的速度,它相当快,因此它与任何编译器或构建问题无关。)

谢谢。

梅特

I need to use Fortran instead of C somewhere and I am very new to Fortran. I am trying to do some big calculations but it is quite slow comparing to C (maybe 10x or more and I am using Intel's compilers for both). I think the reason is Fortran keeps the matrix in column major format, and I am trying to do operations like sum(matrix(i, j, :)), because it is column major, probably this uses the cache very inefficiently (probably not using at all). However, I am not sure if this is the actual reason (since I know so less about Fortran). Question is, the convention in Fortran is to do operations on column vectors instead of row vectors ?

(BTW: I checked the speed of Fortran already using Intel's LAPACK libraries, and it is quite fast, so it is not related to any compiler or build issue.)

Thanks.

Mete

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

神爱温柔 2024-11-14 22:03:20

尝试在进行矩阵运算时更改循环的顺序,例如,如果在 C 中有类似的内容:

for (i = 0; i < M; ++i) // for each row
{
    for (j = 0; j < N; ++j) // for each col
    {
        // matrix operations on e.g. A[i][j]
    }
}

那么在 Fortran 中,您希望 j(列)循环作为外循环,i(行)循环作为内循环。

另一种实现相同效果的方法是保持循环不变,但更改数组的定义,例如,如果在 C 中为 A[x][y][z][t] 然后在 FORTRAN 中将其设为 A[t][z][y][x],假设 t 是最快变化的循环索引,并且 x< /code> 最慢。

Try changing the order of your loops when doing matrix operations, e.g. if you have something like this in C:

for (i = 0; i < M; ++i) // for each row
{
    for (j = 0; j < N; ++j) // for each col
    {
        // matrix operations on e.g. A[i][j]
    }
}

then in Fortran you want the j (column) loop as the outer loop and the i (row) loop as the inner loop.

An alternative approach, which achieves the same thing, is to keep the loops as they are but change the definition of the array, e.g. if in C it's A[x][y][z][t] then in FORTRAN make it A[t][z][y][x], assuming that t is the fastest varying loop index, and x the slowest.

埖埖迣鎅 2024-11-14 22:03:20

正如您所写,由于 Fortran 是列主要的,第一个索引在内存布局中变化最快,因此 sum(matrix(i, j, :)) 会导致非连续位置的求和。如果这确实是操作速度变慢的原因,那么您可以重新定义矩阵以具有不同的维度顺序,以便当前的第三维度是第一维度。是的,如果这是您的主要计算,请重新排列矩阵以使求和成为列运算。正如 @PaulR 所描述的,显式循环应该是最快的早期索引。如果您之前考虑过 C 的最佳索引顺序并且正在更改为 Fortran,那么这是可能需要更改的一方面。虽然这在理论上是正确的,但我怀疑它在实践中是否真的那么重要,除非数组可能非常巨大。 (最糟糕的情况是阵列的一部分位于 RAM 中,部分位于磁盘上的交换区中!)关于运行时速度问题的第一条规则是不要猜测...测量。通常是算法。

Since, as you write, Fortran is column major with the first index varying fastest in memory layout, so sum(matrix(i, j, :)) causes the summation of non-contiguous locations. If this is really the cause of slower operation, then you could redefine your matrix to have a different order of dimensions so that the current 3rd dimension is the 1st. Yes, if this is your main computation, rearrange the matrix to make the summation a column operation. Explicit looping should be as earlier indices fastest, as described by @PaulR. If you had previously thought of the optimum index order for C and are changing to Fortran, this is one aspect that might need changing. But while this is theoretically true, I doubt that it really matters that much in practice, unless perhaps the array is enormous. (The worse case would be that part of the array is in RAM and part in swap on disk!) The first rule about run-time speed issues is don't guess ... measure. It is usually the algorithm.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文