Fortran 的准确性和速度与 C 的比较
这个话题可能已经被讨论过数百次了。我并不是想声称 任何语言都有好坏之分。我只是想学习如何加速我的 C 代码。 所以这里有两个计算 Pi 的代码。
第一个是 Fortran90:
program calcpi
implicit none
integer :: i
real*8 :: pi
pi=0.0
do i = 0,1000000000
pi = pi + 1.0/(4.0*i+1.0)
pi = pi - 1.0/(4.0*i+3.0)
end do
pi = pi * 4.0
write(*,*) pi
end program calcpi
第二个是 C:
#include<stdio.h>
#define STEPCOUNTER 1000000001
int main(int argc, char * argv[])
{
long i;
double pi=0;
#pragma omp parallel for reduction(+: pi)
for ( i=0 ; i < STEPCOUNTER; i++){
/*pi/4=1/11/3+1/51/7+...
To avoid the need to continually change
the sign (s=1; in each step s=s*-1 ),
we add two elements at the same time.*/
pi+=1.0/(i*4.0+1.0);
pi-=1.0/(i*4.0+3.0);
// pi = pi + 1.0/(i*4.0+1.0);
// pi = pi - 1.0/(i*4.0+3.0);
}
pi=pi*4.0;
printf("Pi=%lf\n",pi);
return 0;
}
我在 CentOS 6 机器上使用 gcc 版本 4.4.4 编译这两个代码。
[oz@centos ~]$ gfortran calcpi.f90 -o calcpi.fort.o
[oz@centos ~]$ gfortran calcpi.c -o calcpi.c.o
CPU 为 Intel(R) Xeon(R) CPU 5160 @ 3.00GHz。
因此,以下是运行每个代码所需的时间:
[oz@centos ~]$ time ./calcpi.c.o
Pi=3.141593
real 0m33.270s
user 0m33.261s
sys 0m0.000s
[oz@centos ~]$ time ./calcpi.fort.o
3.1415926553497115
real 0m27.220s
user 0m27.208s
sys 0m0.001s
Fortran 大约快 20%。 我的问题是什么是最好的编译器标志来加速,但仍然保持稳定性和准确性?
(是的,我了解 man gcc,我想了解用户的意见)。
感谢您的意见。
结果,没有 OpenMP pragma:
[oz@centos ~]$ time ./calcpi.c.o
Pi=3.141593
real 0m32.892s
user 0m32.885s
sys 0m0.001s
其他结果,没有更改代码本身:
$ gcc -O2 calcpi.c -o calcpi.c.o
$ time ./calcpi.c.o
Pi=3.141593
real 0m21.085s
user 0m21.078s
sys 0m0.000s
$ gfortran -O2 calcpi.c -o calcpi.c.o
$ time ./calcpi.fort.o
3.1415926553497115
real 0m26.892s
user 0m26.888s
sys 0m0.000s
This subject has probably been discussed hundreds of times. I'm not trying to claim
any language is worse or better. I'm just trying to learn how to accelerate my C codes.
So here are two codes to calculate Pi.
The first is in Fortran90:
program calcpi
implicit none
integer :: i
real*8 :: pi
pi=0.0
do i = 0,1000000000
pi = pi + 1.0/(4.0*i+1.0)
pi = pi - 1.0/(4.0*i+3.0)
end do
pi = pi * 4.0
write(*,*) pi
end program calcpi
The second is in C:
#include<stdio.h>
#define STEPCOUNTER 1000000001
int main(int argc, char * argv[])
{
long i;
double pi=0;
#pragma omp parallel for reduction(+: pi)
for ( i=0 ; i < STEPCOUNTER; i++){
/*pi/4=1/11/3+1/51/7+...
To avoid the need to continually change
the sign (s=1; in each step s=s*-1 ),
we add two elements at the same time.*/
pi+=1.0/(i*4.0+1.0);
pi-=1.0/(i*4.0+3.0);
// pi = pi + 1.0/(i*4.0+1.0);
// pi = pi - 1.0/(i*4.0+3.0);
}
pi=pi*4.0;
printf("Pi=%lf\n",pi);
return 0;
}
I am compiling both codes with gcc version 4.4.4 on a CentOS 6 machine.
[oz@centos ~]$ gfortran calcpi.f90 -o calcpi.fort.o
[oz@centos ~]$ gfortran calcpi.c -o calcpi.c.o
The CPU is Intel(R) Xeon(R) CPU 5160 @ 3.00GHz.
So, here is how much time it takes to run each code:
[oz@centos ~]$ time ./calcpi.c.o
Pi=3.141593
real 0m33.270s
user 0m33.261s
sys 0m0.000s
[oz@centos ~]$ time ./calcpi.fort.o
3.1415926553497115
real 0m27.220s
user 0m27.208s
sys 0m0.001s
Fortran is about 20% Faster.
My Question is what are the best compiler flags to speed up, but still keep the stability and accuracy ?
(And yes, I know about man gcc, I want to know about users' opinions).
Thanks for your opinions.
Result, without OpenMP pragma:
[oz@centos ~]$ time ./calcpi.c.o
Pi=3.141593
real 0m32.892s
user 0m32.885s
sys 0m0.001s
Other results, without changing the code itself:
$ gcc -O2 calcpi.c -o calcpi.c.o
$ time ./calcpi.c.o
Pi=3.141593
real 0m21.085s
user 0m21.078s
sys 0m0.000s
$ gfortran -O2 calcpi.c -o calcpi.c.o
$ time ./calcpi.fort.o
3.1415926553497115
real 0m26.892s
user 0m26.888s
sys 0m0.000s
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通过以双精度进行所有计算,修改 Fortran 程序,使其对应于 C 版本:
在 Xeon X3450 (2.67 GHz) 上的 x86_64-linux-gnu 上使用 GCC 4.4.3 使用 -O2 进行编译 我得到以下计时:
IOW ,它们或多或少是无法区分的。这正是人们对这样一个简单示例的期望。
Modifying the Fortran program such that it corresponds to the C version by making all calculations in double precision:
Compiling with -O2 using GCC 4.4.3 on x86_64-linux-gnu on a Xeon X3450 (2.67 GHz) I get the following timings:
IOW, they are more or less indistinguishable. Which is about what one would expect for such a simple example.