ifort 和 gfortran 之间令人困惑的性能差异

发布于 2024-12-27 13:26:34 字数 1756 浏览 2 评论 0原文

最近，我在 Stack 上读到了一篇帖子关于查找完全平方整数的溢出。因为我想玩这个，所以我编写了以下小程序：

PROGRAM PERFECT_SQUARE
IMPLICIT NONE
INTEGER*8 :: N, M, NTOT
LOGICAL :: IS_SQUARE

N=Z'D0B03602181'
WRITE(*,*) IS_SQUARE(N)

NTOT=0
DO N=1,1000000000
  IF (IS_SQUARE(N)) THEN
    NTOT=NTOT+1
  END IF
END DO
WRITE(*,*) NTOT ! should find 31622 squares
END PROGRAM

LOGICAL FUNCTION IS_SQUARE(N)
IMPLICIT NONE
INTEGER*8 :: N, M

! check if negative
IF (N.LT.0) THEN
  IS_SQUARE=.FALSE.
  RETURN
END IF

! check if ending 4 bits belong to (0,1,4,9)
M=IAND(N,15)
IF (.NOT.(M.EQ.0 .OR. M.EQ.1 .OR. M.EQ.4 .OR. M.EQ.9)) THEN
  IS_SQUARE=.FALSE.
  RETURN
END IF

! try to find the nearest integer to sqrt(n)
M=DINT(SQRT(DBLE(N)))
IF (M**2.NE.N) THEN
  IS_SQUARE=.FALSE.
  RETURN
END IF

IS_SQUARE=.TRUE.
RETURN
END FUNCTION

使用 gfortran -O2 编译时，运行时间为 4.437 秒，使用 -O3 时为 2.657 秒。然后我认为使用 ifort -O2 编译可能会更快，因为它可能有更快的 SQRT 函数，但事实证明运行时间现在是 9.026 秒，而使用 >ifort -O3 相同。我尝试使用Valgrind对其进行分析，Intel编译的程序确实使用了更多的指令。

我的问题是为什么？有没有办法找出差异到底来自哪里？

编辑：

gfortran 版本 4.6.2 和 ifort 版本 12.0.2
时间是通过运行 time ./a.out 获得的，是真实/用户时间（sys 始终几乎为 0），
这是在 Linux x86_64 上，gfortran和ifort都是64位构建
ifort内联所有内容，gfortran仅在-O3处，但后者的汇编代码比ifort更简单，ifort使用xmm注册了很多
固定的代码行，在循环之前添加了 NTOT=0，应该可以解决其他 gfortran 版本的问题

当复杂的 IF 语句被删除时，gfortran 大约需要 4 倍的时间很长时间（10-11 秒）。这是可以预料的，因为该语句大约会丢弃大约 75% 的数字，从而避免对它们进行 SQRT。另一方面，ifort 仅使用稍微多一点的时间。我的猜测是，当 ifort 尝试优化 IF 语句时，出现了问题。

EDIT2：

我尝试使用 ifort 版本 12.1.2.273 它更快，所以看起来他们修复了这个问题。

原文

Recently, I read a post on Stack Overflow about finding integers that are perfect squares. As I wanted to play with this, I wrote the following small program:

PROGRAM PERFECT_SQUARE
IMPLICIT NONE
INTEGER*8 :: N, M, NTOT
LOGICAL :: IS_SQUARE

N=Z'D0B03602181'
WRITE(*,*) IS_SQUARE(N)

NTOT=0
DO N=1,1000000000
  IF (IS_SQUARE(N)) THEN
    NTOT=NTOT+1
  END IF
END DO
WRITE(*,*) NTOT ! should find 31622 squares
END PROGRAM

LOGICAL FUNCTION IS_SQUARE(N)
IMPLICIT NONE
INTEGER*8 :: N, M

! check if negative
IF (N.LT.0) THEN
  IS_SQUARE=.FALSE.
  RETURN
END IF

! check if ending 4 bits belong to (0,1,4,9)
M=IAND(N,15)
IF (.NOT.(M.EQ.0 .OR. M.EQ.1 .OR. M.EQ.4 .OR. M.EQ.9)) THEN
  IS_SQUARE=.FALSE.
  RETURN
END IF

! try to find the nearest integer to sqrt(n)
M=DINT(SQRT(DBLE(N)))
IF (M**2.NE.N) THEN
  IS_SQUARE=.FALSE.
  RETURN
END IF

IS_SQUARE=.TRUE.
RETURN
END FUNCTION

When compiling with gfortran -O2, running time is 4.437 seconds, with -O3 it is 2.657 seconds. Then I thought that compiling with ifort -O2 could be faster since it might have a faster SQRT function, but it turned out running time was now 9.026 seconds, and with ifort -O3 the same. I tried to analyze it using Valgrind, and the Intel compiled program indeed uses many more instructions.

My question is why? Is there a way to find out where exactly the difference comes from?

EDITS:

gfortran version 4.6.2 and ifort version 12.0.2
times are obtained from running time ./a.out and is the real/user time (sys was always almost 0)
this is on Linux x86_64, both gfortran and ifort are 64-bit builds
ifort inlines everything, gfortran only at -O3, but the latter assembly code is simpler than that of ifort, which uses xmm registers a lot
fixed line of code, added NTOT=0 before loop, should fix issue with other gfortran versions

When the complex IF statement is removed, gfortran takes about 4 times as much time (10-11 seconds). This is to be expected since the statement approximately throws out about 75% of the numbers, avoiding to do the SQRT on them. On the other hand, ifort only uses slightly more time. My guess is that something goes wrong when ifort tries to optimize the IF statement.

EDIT2:

I tried with ifort version 12.1.2.273 it's much faster, so looks like they fixed that.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不爱素颜 2025-01-03 13:26:34

您使用什么编译器版本？
有趣的是，它看起来像是从 11.1 到 12.0 的性能回归——例如，对我来说，11.1（ifort -fast square.f90）需要 3.96 秒，而 12.0（相同选项）需要 13.3 秒。
gfortran (4.6.1) (-O3) 仍然更快（3.35s）。
我以前见过这种倒退，尽管没有那么戏剧性。
顺便说一句，将 if 语句替换为

is_square = any(m == [0, 1, 4, 9])
if(.not. is_square) return

使其在 ifort 12.0 中运行速度提高了一倍，但在 gfortran 和 ifort 11.1 中运行速度较慢。

看起来问题的一部分是 12.0 在尝试向量化方面过于激进：

!DEC$ NOVECTOR

在 DO 循环之前添加（不更改代码中的任何其他内容）将运行时间减少到 4.0 秒。

另外，还有一个附带好处：如果您有多核 CPU，请尝试在 ifort 命令行中添加 -parallel :)

What compiler versions are you using?
Interestingly, it looks like a case where there is a performance regression from 11.1 to 12.0 -- e.g. for me, 11.1 (ifort -fast square.f90) takes 3.96s, and 12.0 (same options) took 13.3s.
gfortran (4.6.1) (-O3) is still faster (3.35s).
I have seen this kind of a regression before, although not quite as dramatic.
BTW, replacing the if statement with

is_square = any(m == [0, 1, 4, 9])
if(.not. is_square) return

makes it run twice as fast with ifort 12.0, but slower in gfortran and ifort 11.1.

It looks like part of the problem is that 12.0 is overly aggressive in trying to vectorize things: adding

!DEC$ NOVECTOR

right before the DO loop (without changing anything else in the code) cuts the run time down to 4.0 sec.

Also, as a side benefit: if you have a multi-core CPU, try adding -parallel to the ifort command line :)

回复收藏 0 原文

~没有更多了~

关于作者

栀子花开つ

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

ifort 和 gfortran 之间令人困惑的性能差异

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

qq_aHcEbj

qq_ikhFfg

寻找我们的幸福

把昨日还给我

wj_zym

巴黎夜雨

友情链接

ifort 和 gfortran 之间令人困惑的性能差异

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

qq_aHcEbj

qq_ikhFfg

寻找我们的幸福

把昨日还给我

wj_zym

巴黎夜雨

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。