没有指令的高性能 Fortran (HPF)?
在高性能 Fortran (HPF) 中,我可以使用 DISTRIBUTE 指令指定并行计算中涉及的数组分布。例如,以下最小子例程将并行对两个数组求和:
subroutine mysum(x,y,z)
integer, intent(in) :: y(10000), z(10000)
integer, intent(out) :: x(10000),
!HPF$ DISTRIBUTE x(BLOCK), y(BLOCK), z(BLOCK)
x = y + z
end subroutine mysum
我的问题是,DISTRIBUTE
指令有必要吗?我知道在实践中这没什么意义,但我很好奇一个朴素的、无指令的 Fortran 程序是否也可以是一个有效的 HPF 程序?
In High Performance Fortran (HPF), I could specify the distribution of arrays involved in a parallel calculation using the DISTRIBUTE
directive. For example, the following minimal subroutine will sum two arrays in parallel:
subroutine mysum(x,y,z)
integer, intent(in) :: y(10000), z(10000)
integer, intent(out) :: x(10000),
!HPF$ DISTRIBUTE x(BLOCK), y(BLOCK), z(BLOCK)
x = y + z
end subroutine mysum
My question is, is the DISTRIBUTE
directive necessary? I know in practise this is of little interest, but I'm curious as to whether an unadorned, directive-free, Fortran program could also be a valid HPF program?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不认为 DISTRIBUTE 语句是必要的,而且我从未使用过它。
您可以在适用的情况下通过使用 FORALL 语句而不是 DO 循环来隐式实现此目的。最初,DO 循环将给出对数组元素的明确操作顺序,而 FORALL 将允许处理器在运行时确定最佳顺序。我认为现在这没有多大区别,因为现代编译器能够在可能的情况下优化/向量化/并行化 DO 循环。我无法确定其他编译器的情况,但我记得使用 Intel Fortran 编译器在 2 个和 4 个处理器上并行编译和运行程序,而不使用 DISTRIBUTE。
但是,根据处理器架构和编译器,最好尝试一下您所拥有的,看看什么可以给您带来最佳结果或效率。
I do not believe DISTRIBUTE statement is necessary, and I never used it.
You can achieve this implicitly by using FORALL statements instead of DO loops where applicable. Originally, DO loops would give explicit order of operation on array elements, whereas FORALL would allow the processor to determine an optimal order at runtime. I do not think this makes much difference nowadays, because modern compilers are able to optimize/vectorize/parallelize DO loops where possible. I cannot tell for sure for other compilers, but I remember using Intel Fortran Compiler to compile and run a program on 2 and 4 processors in parallel without using DISTRIBUTE.
However, depending on the processor architecture and compiler, it is best to try out what you have and see what gives you optimal results or efficiency.