Fortran 全局工作数组与本地动态分配数组
我正在使用已升级到 F9X 的较旧 F77 代码。它仍然具有一些较旧的“遗留”代码结构,我对以遗留方式或现代方式添加代码的性能方面感到好奇。我们有一个单独的 F9x 代码,我们正在尝试将其集成到这个旧代码中,并尽可能多地使用它们的过程,而不是重写我们自己的版本。另请注意,假设所有这些过程都没有显式接口。
具体来说,旧代码有一个大型的 1 级工作数组,该数组在主程序中分配,并且随着该数组被更深入地传递到过程中,它会被拆分并在需要的地方使用。本质上,存在一次分配/释放,该数组的唯一开销涉及查找所需临时数组的起始索引(微不足道)并将工作数组的这些部分传递到过程中。
我们的新代码通常使用旧代码中的较低级别过程,其中多个虚拟数组源自旧代码的全局工作数组。我可以在需要的地方创建动态分配的数组,而不是创建我们自己的工作数组、查找起始索引以及传递所有这些数组部分及其起始索引。然而,这些过程在代码执行期间可能被调用数千次(对于某些较低级别的例程可能是数百万次),我担心每次使用这些过程时分配和取消分配的开销。此外,这些临时数组可能包含数百万个双精度元素。
我也尝试过自动数组,但当我开始遇到堆栈溢出问题时就停止了,现在几乎只使用动态数组。关于如何存储不同类型数组的内存,我听说过关于堆栈和堆的不同说法,但我真的不知道其中的区别以及哪个更好(性能、效率等)。
长话短说,这些动态分配(或自动)数组是否会由于开销问题而显着降低效率?我还意识到动态分配的数组在代码的生命周期中更加健壮,但我真正追求的是性能。 5% 的性能提升可能意味着代码执行时间可以节省很多时间。
我意识到由于编译器优化和其他因素的差异,我可能无法得到明确的答案,但我很好奇是否有人对类似的事情有一些知识/经验。感谢您的帮助。
I am working with an older F77 code that has been upgraded to F9X. It still has some older "legacy" code structure and I'm curious on the performance aspect towards adding code in the legacy way or modern way. We have a separate F9x code that we are trying to integrate into this older code and use as many of their procedures as possible instead of rewriting our own versions. Also note, assume that all of these procedures are NOT explicitly interfaced.
Specifically, the old code has one large rank-1 work array that is allocated in the main program and as this array is passed deeper into procedures, it is split apart and used where it is needed. Essentially there is one allocation/deallocation and the only overhead with this array involves finding the starting indices (trivial) of needed temporary arrays and passing these sections of the work array into the procedure.
Our new code generally uses lower level procedures from the old code in which multiple dummy arrays originated from the older code's global work array. Instead of the hassle of creating our own work array, finding starting indices, and passing all these array sections with their starting indices, I could just create dynamically allocated arrays where they are needed. However, these procedures can be called thousands (possibly millions for some lower level routines) of times during the code execution and I am concerned with the overhead of allocating and deallocating each time any of these procedures are used. Also, these temporary arrays could contain many millions of double precision elements.
I've also dabbled with automatic arrays but stopped when I started encountering stack overflow issues and now almost exclusively use dynamic arrays. I've heard different things about the stack and heap with regards to how memory for different kinds of arrays is stored but I really don't know the difference and which is better (performance, efficiency, etc.).
Long story short, are these dynamically allocated (or automatic) arrays going to be significantly less efficient due to overhead issues? I also realize that dynamically allocated arrays are more robust in the life span of the code but what I am really after is performance. A 5% performance gain could mean many hours saved in code execution.
I realize I might not get a definitive answer to this due to differences in compiler optimizations and other factors but I'm curious if anyone might have some knowledge/experience with anything similar. Thanks for your help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为任何答案都将是猜测和猜测。我的猜测:数组创建的 CPU 负载非常低。除非这些子例程执行的计算量可以忽略不计,否则不同数组类型的不同开销不会引人注目。但唯一可以确定的方法是尝试两种不同的方法并对它们进行计时,例如使用 Fortran 内在 cpu_time。
自动数组通常放置在堆栈上,但有些编译器将大型自动数组放置在堆上。某些编译器可以选择更改此行为。可分配的可能在堆上。
I think that any answers are going to be guesses and speculation. My guess: array creation is going to be a very low CPU load. Unless these subroutines are doing a negligible amount of computations, the differing overhead of differing arrays types won't be noticeable. But the only way to be sure would be to try two different methods and to time them, e.g., with the Fortran intrinsic cpu_time.
Automatic arrays are usually placed on the stack, but some compilers place large automatic arrays on the heap. Some compilers have an option to change this behavior. Allocatable are probably on the heap.