Fortran 矩阵运算的性能
我需要在某个地方使用 Fortran 而不是 C,而且我对 Fortran 很陌生。我正在尝试进行一些大型计算,但与 C 相比,它相当慢(可能是 10 倍或更多,而且我都使用英特尔的编译器)。我认为原因是 Fortran 将矩阵保持为列主格式,并且我正在尝试执行诸如 sum(matrix(i, j, :)) 之类的操作,因为它是列主格式,可能这使用缓存的效率非常低(可能不是)完全使用)。然而,我不确定这是否是真正的原因(因为我对 Fortran 知之甚少)。问题是,Fortran 中的约定是对列向量而不是行向量进行运算?
(顺便说一句:我已经使用英特尔的 LAPACK 库检查了 Fortran 的速度,它相当快,因此它与任何编译器或构建问题无关。)
谢谢。
梅特
I need to use Fortran instead of C somewhere and I am very new to Fortran. I am trying to do some big calculations but it is quite slow comparing to C (maybe 10x or more and I am using Intel's compilers for both). I think the reason is Fortran keeps the matrix in column major format, and I am trying to do operations like sum(matrix(i, j, :)), because it is column major, probably this uses the cache very inefficiently (probably not using at all). However, I am not sure if this is the actual reason (since I know so less about Fortran). Question is, the convention in Fortran is to do operations on column vectors instead of row vectors ?
(BTW: I checked the speed of Fortran already using Intel's LAPACK libraries, and it is quite fast, so it is not related to any compiler or build issue.)
Thanks.
Mete
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尝试在进行矩阵运算时更改循环的顺序,例如,如果在 C 中有类似的内容:
那么在 Fortran 中,您希望 j(列)循环作为外循环,i(行)循环作为内循环。
另一种实现相同效果的方法是保持循环不变,但更改数组的定义,例如,如果在 C 中为 A[x][y][z][t] 然后在 FORTRAN 中将其设为
A[t][z][y][x]
,假设t
是最快变化的循环索引,并且x< /code> 最慢。
Try changing the order of your loops when doing matrix operations, e.g. if you have something like this in C:
then in Fortran you want the j (column) loop as the outer loop and the i (row) loop as the inner loop.
An alternative approach, which achieves the same thing, is to keep the loops as they are but change the definition of the array, e.g. if in C it's
A[x][y][z][t]
then in FORTRAN make itA[t][z][y][x]
, assuming thatt
is the fastest varying loop index, andx
the slowest.正如您所写,由于 Fortran 是列主要的,第一个索引在内存布局中变化最快,因此 sum(matrix(i, j, :)) 会导致非连续位置的求和。如果这确实是操作速度变慢的原因,那么您可以重新定义矩阵以具有不同的维度顺序,以便当前的第三维度是第一维度。是的,如果这是您的主要计算,请重新排列矩阵以使求和成为列运算。正如 @PaulR 所描述的,显式循环应该是最快的早期索引。如果您之前考虑过 C 的最佳索引顺序并且正在更改为 Fortran,那么这是可能需要更改的一方面。虽然这在理论上是正确的,但我怀疑它在实践中是否真的那么重要,除非数组可能非常巨大。 (最糟糕的情况是阵列的一部分位于 RAM 中,部分位于磁盘上的交换区中!)关于运行时速度问题的第一条规则是不要猜测...测量。通常是算法。
Since, as you write, Fortran is column major with the first index varying fastest in memory layout, so sum(matrix(i, j, :)) causes the summation of non-contiguous locations. If this is really the cause of slower operation, then you could redefine your matrix to have a different order of dimensions so that the current 3rd dimension is the 1st. Yes, if this is your main computation, rearrange the matrix to make the summation a column operation. Explicit looping should be as earlier indices fastest, as described by @PaulR. If you had previously thought of the optimum index order for C and are changing to Fortran, this is one aspect that might need changing. But while this is theoretically true, I doubt that it really matters that much in practice, unless perhaps the array is enormous. (The worse case would be that part of the array is in RAM and part in swap on disk!) The first rule about run-time speed issues is don't guess ... measure. It is usually the algorithm.