英特尔编译器与 GCC

发布于 2024-12-18 19:13:41 字数 347 浏览 3 评论 0原文

当我使用 Intel 编译器编译应用程序时,它比使用 GCC 编译它要慢。英特尔编译器的输出速度慢了 2 倍以上。该应用程序包含多个嵌套循环。 GCC 和我所缺少的 Intel 编译器之间有什么区别吗?我是否需要打开其他一些标志来提高英特尔编译器的性能?我预计 Intel 编译器至少和 GCC 一样快。

编译器版本:

 Intel version  12.0.0 20101006 
 GCC   version  4.4.4  20100630

两个编译器的编译器标志相同:

-O3 -openmp -parallel -mSSE4.2 -Wall -pthread

When I compile an application with Intel's compiler it is slower than when I compile it with GCC. The Intel compiler's output is more than 2x slower. The application contains several nested loops. Are there any differences between GCC and the Intel compiler that I am missing? Do I need to turn on some other flags to improve the Intel compiler's performance? I expected the Intel compiler to be at least as fast as GCC.

Compiler Versions:

 Intel version  12.0.0 20101006 
 GCC   version  4.4.4  20100630

The compiler flags are the same with both compilers:

-O3 -openmp -parallel -mSSE4.2 -Wall -pthread

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

他不在意 2024-12-25 19:13:41

我没有使用英特尔编译器的经验,所以我无法回答您是否缺少一些标志。

然而,据我所知,最近版本的 gcc 在优化代码方面通常与 icc 一样好(有时更好,有时更差(尽管大多数来源似乎表明通常更好)),因此您可能会遇到 icc 特别糟糕的情况。有关每个编译器可以执行哪些优化的示例,请参见此处此处。即使 gcc 通常不是更好,您也可能只是遇到 gcc 识别优化而 icc 不识别的情况。编译器可能对优化哪些内容、不优化哪些内容非常挑剔,尤其是在自动向量化等方面。

如果您的循环足够小,那么比较 gcc 和 icc 之间生成的汇编代码可能是值得的。此外,如果您显示一些代码或至少告诉我们您在循环中做了什么,我们也许能够为您提供更好的推测导致此行为的原因。例如在某些情况下。如果它是一个相对较小的循环,则很可能是 icc 缺少一个(或一些,但可能不是很多)优化,这些优化要么具有固有的良好潜力(预取、自动矢量化、展开、循环不变运动等),要么启用其他优化优化(主要是内联)。

请注意,当我比较 gcc 和 icc 时,我只是在谈论优化潜力。最后,icc 通常可能会生成比 gcc 更快的代码,但不是那么多,因为它做了更多优化,而是因为它有更快的标准库实现,并且因为它更智能地优化哪里(在高优化级别上,gcc 得到了一点点)过分热衷于(或者至少过去)用代码大小来换取(理论上的)运行时改进,这实际上会损害性能,例如,当仔细展开和矢量化的循环仅执行 3 次迭代时。

I have no experience with the intel compiler so I can't answer whether you are missing some flags or not.

However from what I recall recent versions of gcc are generally as good at optimizing code as icc (sometimes better, sometimes worse (although most sources seem to indicate to generally better)), so you might have run into a situation where icc is particulary bad. Examples for what optimizations each compiler can do can be found here and here. Even if gcc is not generally better you could simply have a case which gcc recognizes for optimization and icc doesn't. Compilers can be very picky about what they optimize and what not, especially regarding things like autovectorization.

If your loop is small enough it might be worth it to compare the generated assembly code between gcc and icc. Also if you show some code or at least tell us what you are doing in your loop we might be able to give you better speculations what leads to this behaviour. For example in some situations. If it's a relatively small loop it is likely a case of icc missing one (or some, but probably not many) optimization which either have inherently good potential (prefetching, autovectorization, unrolling, loop invariant motion,...) or which enable other optimizations (primarily inlining).

Note that I'm only talking about optimization potential when I compare gcc to icc. In the end icc might typically generate faster code then gcc, but not so much because it does more optimizations, but because it has a faster standard library implementation and because it is smarter about where to optimize (on high optimization levels gcc gets a little bit overeager (or at least it used to) about trading code size for (theoretical) runtime improvements. This can actually hurt performance, e.g. when the carefully unrolled and vectorized loop is only ever executed with 3 iterations.

峩卟喜欢 2024-12-25 19:13:41

我通常使用 -inline-level=1 -inline-forceinline 来确保我显式声明 inline 的函数确实得到内联。除此之外,我预计 ICC 性能至少与 gcc 一样好。您需要分析您的代码以了解性能差异来自何处。如果这是 Linux,那么我建议使用 Zoom,您可以通过 30 天的免费评估获得它。

I normally use -inline-level=1 -inline-forceinline to make sure that functions which I have explicitly declared inline actually do get inlined. Other than that I would expect ICC performance to be at least as good as with gcc. You will need to profile your code to see where the performance difference is coming from. If this is Linux then I recommend using Zoom, which you can get on a free 30 day evaluation.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文