Fortran 95 结构(例如 WHERE、FORALL 和 SPREAD)通常会产生更快的并行代码吗?
我已经通读了 Metcalf、Reid 和 Cohen 编写的 Fortran 95 书以及 Fortran 90 中的数值食谱。他们建议使用 WHERE、FORALL 和 SPREAD 等以避免程序不必要的序列化。
但是,我偶然发现 这个答案声称 FORALL 在理论上很好,但在实践中毫无意义 - 您也可以编写循环,因为它们并行化效果也很好,并且您可以使用 OpenMP(或某些编译器的自动功能,例如英特尔)。
任何人都可以根据经验验证他们是否普遍发现这些构造在并行性能方面比显式循环和 if 语句具有任何优势?
该语言是否还有其他类似的特性,原则上很好,但在实践中不值得?
我知道这些问题的答案在某种程度上取决于实现,因此我对 gfortran、Intel CPU 和 SMP 并行性最感兴趣。
I have read through the Fortran 95 book by Metcalf, Reid and Cohen, and Numerical Recipes in Fortran 90. They recommend using WHERE, FORALL and SPREAD amongst other things to avoid unnecessary serialisation of your program.
However, I stumbled upon this answer which claims that FORALL is good in theory, but pointless in practice - you might as well write loops as they parallelise just as well and you can explicitly parallelise them using OpenMP (or automatic features of some compilers such as Intel).
Can anyone verify from experience whether they have generally found these constructs to offer any advantages over explicit loops and if statements in terms of parallel performance?
And are there any other parallel features of the language which are good in principal but not worth it in practice?
I appreciate that the answers to these questions are somewhat implementation dependant, so I'm most interested in gfortran, Intel CPUs and SMP parallelism.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
正如我在回答另一个问题时所说,人们普遍认为 FORALL 并没有像引入该语言时所希望的那样有用。正如其他答案中已经解释的那样,它具有限制性要求和有限的作用,并且编译器已经变得非常擅长优化常规循环。编译器不断变得更好,并且功能因编译器而异。另一个线索是 Fortran 2008 正在再次尝试......除了向语言添加显式并行化(协同数组,已经提到)之外,还有“并发”,这是一种需要限制的新循环形式,应该更好地允许编译器执行自动并行化优化,但应该足够通用才能有用 - 请参阅 ftp://ftp .nag.co.uk/sc22wg5/N1701-N1750/N1729.pdf。
在获取速度方面,我主要选择良好的算法和程序以提高可读性和性能。可维护性。只有当程序太慢时,我才会找到瓶颈并重新编码或实现多线程(OpenMP)。在极少数情况下,FORALL 或 WHERE 与显式 do 循环相比会产生有意义的速度差异——我会更多地关注它们如何清楚地表达程序的意图。
As I said in my answer to the other question, there is a general belief that FORALL has not been as useful as was hoped when it was introduced to the language. As already explained in other answers, it has restrictive requirements and a limited role, and compilers have become quite good at optimizing regular loops. Compilers keep getting better, and capabilities vary from compiler to compiler. Another clue is that the Fortran 2008 is trying again... besides adding explicit parallelization to the language (co-arrays, already mentioned), there is also "do concurrent", a new loop form that requires restrictions that should better allow the compiler to perform automatic parallization optimizations, yet should be sufficiently general to be useful -- see ftp://ftp.nag.co.uk/sc22wg5/N1701-N1750/N1729.pdf.
In terms of obtaining speed, mostly I select good algorithms and program for readability & maintainability. Only if the program is too slow do I locate the bottle necks and recode or implement multi-threading (OpenMP). It will be a rare case where FORALL or WHERE versus an explicit do loop will have a meaningful speed difference -- I'd look more to how clearly they state the intent of the program.
我对此进行了粗浅的研究,并且遗憾地报告,通常发现显式编写循环会比您编写的并行结构产生更快的程序。即使是简单的全数组赋值(例如 A = 0)通常也优于 do 循环。
我手头没有任何数据,如果我有的话,它就会过时。我真的应该将所有这些放入测试套件中并再次尝试,编译器确实有所改进(有时它们也会变得更糟)。
我仍然使用并行结构,尤其是整个数组操作,因为它们是表达我想要实现的目标的最自然的方式。我尚未在 OpenMP 工作共享构造中测试过这些构造。我确实应该这么做。
I've looked shallowly into this and, sad to report, generally find that writing my loops explicitly results in faster programs than the parallel constructs you write about. Even simple whole-array assignments such as
A = 0
are generally outperformed by do-loops.I don't have any data to hand and if I did it would be out of date. I really ought to pull all this into a test suite and try again, compilers do improve (sometimes they get worse too).
I do still use the parallel constructs, especially whole-array operations, when they are the most natural way to express what I'm trying to achieve. I haven't ever tested these constructs inside OpenMP workshare constructs. I really ought to.
FORALL 是一个通用的屏蔽赋值语句(如 WHERE)。它不是循环结构。
编译器可以使用 SIMD 指令(SSE2、SSE3 等)并行化 FORALL/WHERE,这对于获得一些低级并行化非常有用。当然,一些较差的编译器不会打扰,只是将代码序列化为循环。
OpenMP 和 MPI 在较粗的粒度级别上更有用。
FORALL is a generalised masked assignment statement (as is WHERE). It is not a looping construct.
Compilers can parallelise FORALL/WHERE using SIMD instructions (SSE2, SSE3 etc) and is very useful to get a bit of low-level parallelisation. Of course, some poorer compilers don't bother and just serialise the code as a loop.
OpenMP and MPI is more useful at a coarser level of granularity.
理论上,使用这样的赋值可以让编译器知道你想要做什么,并且应该允许它更好地优化它。在实践中,请参阅马克的答案...我也认为如果代码看起来更干净的话这是有用的。我自己也使用过诸如
FORALL
之类的东西,但没有注意到与常规DO
循环相比有任何性能变化。至于优化,您打算使用哪种并行性?我非常不喜欢 OpenMP,但我想如果您打算使用它,您应该首先测试这些构造。
In theory, using such assignments lets the compiler know what you want to do and should allow it to optimize it better. In practice, see the answer from Mark... I also think it's useful if the code looks cleaner that way. I have used things such as
FORALL
myself a couple of times, but didn't notice any performance changes over regularDO
loops.As for optimization, what kind of parallellism do you intent to use? I very much dislike OpenMP, but I guess if you inted to use that, you should test these constructs first.
*这应该是一条评论,而不是一个答案,但它不适合那个小盒子,所以我把它放在这里。不要反对我:-) 不管怎样,继续@steabert 对他的答案的评论。 OpenMP 和 MPI 是两个不同的东西;人们很少能够在两者之间进行选择,因为它更多地取决于您的架构而不是个人选择。就学习并行性概念而言,我随时都会推荐 OpenMP;它更简单,并且以后可以轻松地继续过渡到 MPI。
但是,这不是我想说的。这是 - 几天前,Intel 宣布开始支持 Co-Arrays,这是之前只有 g95 支持的 F2008 功能。他们并不打算放弃 g95,但事实是英特尔的编译器更广泛地用于生产代码,因此这绝对是一个有趣的开发路线。他们还更改了 Visual Fortran 编译器中的一些内容(名称,首先:-)
链接后的更多信息:http://software.intel.com/en-us/articles/intel-compilers/
*This should be a comment, not an answer, but it won't fit into that little box, so I'm putting it here. Don't hold it against me :-) Anyways, to continue somewhat onto @steabert's comment on his answer. OpenMP and MPI are two different things; one rarely gets to choose between the two since it's more dictated by your architecture than personal choice. As far as learning concepts of paralellism go, I would recommend OpenMP any day; it is simpler and one easily continues the transition to MPI later on.
But, that's not what I wanted to say. This is - a few days back from now, Intel has announced that it has started supporting Co-Arrays, a F2008 feature previously only supported by g95. They're not intending to put down g95, but the fact remains that Intel's compiler is more widely used for production code, so this is definitely an interesting line of developemnt. They also changed some things in their Visual Fortran Compiler (the name, for a start :-)
More info after the link: http://software.intel.com/en-us/articles/intel-compilers/