在比较用 C++ 编写的两种不同算法时,您使用的优化级别 (g++) 是多少?

发布于 2024-08-06 18:18:29 字数 815 浏览 2 评论 0原文

我有两个用 C++ 编写的算法。据我所知,传统的编译方式是
-O0 -NDEBUG (g++) 同时比较两种算法的性能(渐近地它们是相同的)。
但我认为优化级别对其中之一是不公平的,因为它在每种情况下都使用STL。使用 -O0 选项编译时,使用普通数组的程序比 STL 重算法快 5 倍。但是当我使用 -O2 -NDEBUG 编译它们时,性能差异并没有太大差异。

有什么方法可以在优化级别 -O0 中充分利用 STL(我在 vector [] 运算符中受到严重的性能影响)?

在比较两种算法时,您使用什么优化级别(以及可能的变量,如 -NDEBUG)?

如果有人能提供一些有关比较用 C++ 编写的算法的性能的学术研究趋势的想法,这也会有很大的帮助?


好的,为了隔离优化级别的问题,我现在使用一种算法但两种不同的实现。

我已将带有原始指针(int 和 boolean)的函数之一更改为 std::vector 和 std::vector...使用 -O0 -NDEBUG 性能为 5.46s(原始指针)和 11.1s(std::vector )。使用 -O2 -NDEBUG 时,性能为 2.02 秒(原始指针)和 2.21 秒(std::vector)。相同的算法,一种实现是使用 4/5 int 和 boolean 动态数组。另一种是使用 std::vector 和 std::vector 代替。它们在其他所有情况下都是相同的

您可以看到,在 -O0 std::vector 中,指针的速度要快两倍。而在-O2 中它们几乎是相同的。

但我真的很困惑,因为在学术领域,当他们在运行时发布算法结果时,他们用-O0来编译程序。

我缺少一些编译器选项吗?

I have two algorithms written in C++. As far as I know, it is conventional to compile with
-O0 -NDEBUG (g++) while comparing the performance of two algorithms(asymptotically they are same).
But I think the optimization level is unfair to one of them, because it uses STL in every case. The program which uses plain array outperforms the STL-heavy algorithm 5 times faster while compiled with -O0 options. But the performance difference is not much different when I compile them with -O2 -NDEBUG.

Is there any way to get the best out of STL (I am getting heavy performance hit in the vector [] operator) in optimization level -O0?

What optimization level (and possibly variables like -NDEBUG) do you use while comparing two algorithms?

It will be also great help if someone can give some idea about the trend in academic research about comparing the performance of algorithms written in C++?


Ok, To isolate the problem of optimization level, I am using one algorithm but two different implementation now.

I have changed one of the functions with raw pointers(int and boolean) to std::vector and std::vector... With -O0 -NDEBUG the performances are 5.46s(raw pointer) and 11.1s(std::vector). And with -O2 -NDEBUG , the performances are 2.02s(raw pointer) and 2.21s(std::vector). Same algorithm, one implementation is using 4/5 dynamic arrays of int and boolean. And the other one is using using std::vector and std::vector instead. They are same in every other case

You can see that in -O0 std::vector is outperformed with twice faster pointers. While in -O2 they are almost the same.

But I am really confused, because in academic fields, when they publish the results of algorithms in running time, they compile the programs with -O0.

Is there some compiler options I am missing?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

临风闻羌笛 2024-08-13 18:18:29

这取决于您想要优化的内容。

速度

我建议使用 -O2 -NDEBUG -ftree-vectorize ,如果您的代码专门设计为在 x86 或 x86_64 上运行,请添加 -msse2 。这将使您对 GIMPLE 的执行方式有一个大致的了解。

大小

我相信你应该使用 -Os -fno-rtti -fno-exceptions -fomit-frame-pointer 。这将在一定程度上最小化可执行文件的大小(假设是 C++)。


在这两种情况下,算法的速度都不依赖于编译器,但如果编译器能够“证明”它可以,它就可以极大地改变代码的行为方式。

GCC 检测“常见”代码,例如手动编码的 min()max() 并将它们转换为一条 SSE 指令(在 x86/x86_64 上且设置 -msse 时) )或在 i686 可用时使用 cmov(SSE 具有更高优先级)。如果愿意的话,GCC 还会自由地重新排序循环、展开和内联函数,甚至删除无用的代码。

至于您的最新编辑:

你可以看到 -O0 std::vector 中是
表现优于其他,速度快两倍
指针。而在 -O2 中,他们几乎
一样的。

这是因为 std::vector 仍然有引发异常并可能使用 rtti 的代码。尝试与 -O2 -NDEBUG -ftree-vectorize -fno-rtti -fno-exceptions -fomit-frame-pointer 进行比较,您会发现 std::vector 会比您的代码稍好一些。 GCC 知道什么是“内置”类型以及如何在现实世界使用中利用它们,并且很乐意这样做 - 就像它知道什么是 memset()memcpy() 的作用以及在已知副本大小时如何进行相应优化。

It depends on what you want to optimize for.

Speed

I suggest using -O2 -NDEBUG -ftree-vectorize, and if your code is designed to specifically run on x86 or x86_64, add -msse2. This will give you a broad idea on how it will perform with GIMPLE.

Size

I believe you should use -Os -fno-rtti -fno-exceptions -fomit-frame-pointer. This will minimize the size of the executable to a degree (assuming C++).


In both cases, algorithm's speed is not compiler dependent, but a compiler can drastically change the way the code behaves if it can "prove" it can.

GCC detects 'common' code such as hand-coded min() and max() and turns them into one SSE instruction (on x86/x86_64 and when -msse is set) or using cmov when i686 is available (SSE has higher priority). GCC will also take liberty in reordering loops, unrolling and inlining functions if it wants to, and even remove useless code.

As for your latest edit:

You can see that in -O0 std::vector is
outperformed with twice faster
pointers. While in -O2 they are almost
the same.

That's because std::vector still has code that throws exceptions and may use rtti. Try comparing with -O2 -NDEBUG -ftree-vectorize -fno-rtti -fno-exceptions -fomit-frame-pointer, and you'll see that std::vector will be slightly better than your code. GCC knows what 'built-in' types are and how to exploit them in real world use and will gladly do so - just like it knows what memset() and memcpy() does and how to optimize accordingly when copy size is known.

天涯离梦残月幽梦 2024-08-13 18:18:29

编译器优化通常不会改变算法的复杂度顺序,只会改变常数和线性比例因子。编译器相当聪明,但没有那么聪明。

您打算仅使用 -O0 来编译要发布的代码吗?可能不会。您不妨将算法与您实际打算使用的任何编译标志进行编译时的性能进行比较。

The compiler optimizations usually won't change the complexity order of an algorithm, just the constant and the linear scale factor. Compilers are fairly smart, but they're not that smart.

Are you going to be compiling your code for release with just -O0? Probably not. You might as well compare the performance of the algorithms when compiled with whatever compilation flags you actually intend to use.

幻想少年梦 2024-08-13 18:18:29

您有两种用 C++ 实现的算法。如果您想比较两种实现的相对性能,那么您应该使用将在最终产品中使用的优化级别。对我来说,那就是-O3

如果您想分析算法的复杂性,那么这更像是一个分析问题,您可以查看针对不同大小和输入特征必须执行的操作总数。

作为编写性能成为问题的代码的开发人员,最好了解编译器可以并且可能应用于您的代码的优化范围。不进行优化会不公平地惩罚那些编写清晰但设计为可以轻松针对已经“微优化”的代码进行优化的代码。

You have two algorithms implemented in C++. If you want to compare the relative performance of the two implementations then you should use the optimization level that you are going to use in your final product. For me, that's -O3.

If you want to analyse the complexity of an algorithm, then that's more of an analysis problem where you look at the overall count of operations that must be performed for different sizes and characteristics of inputs.

As a developer writing code where performance is an issue, it is a good idea to be aware of the range of optimizations that a compiler can, and is likely to, apply to your code. Not optimizing unfairly penalises code that is written clearly, but designed to be easily optimized against code that is already 'micro-optimized'.

离笑几人歌 2024-08-13 18:18:29

我认为没有理由不在 O2 上编译并运行它们。除非你将其作为纯粹的学术练习(即使你是这样,优化也不太可能对算法的属性产生根本性的改变 - 不过,我想如果 GCC 开始转向 O(N) 我会很高兴源到 O(lgN) 程序集),您将需要与实际运行最终程序时获得的信息一致的信息。您很可能不会发布具有 O0 优化的程序,因此您不想比较 O0 优化下的算法。

I see no reason not to compile and run them both at O2. Unless you're doing it as a purely academic exercise (and even if you were it's very unlikely the optimizations would produce fundamental changes in the properties of the algorithm - Though, I think I'd be happy if GCC started turnning O(N) source into O(lgN) assembly) , you'll want information that's consistant what you would get when actually running the final program. You most likely won't be releasing the program with O0 optimizations, so you don't want to compare the algorithms under O0 optimizations.

陌上青苔 2024-08-13 18:18:29

这种比较与其说是为了公平,不如说是为了产生有用的信息。您应该使用您计划在代码投入生产使用时使用的优​​化级别。如果您基本上是在进行研究,那么个人并不打算将其投入生产使用,那么您就会陷入稍微困难的工作,即猜测将其投入生产的人会做什么可能会。

实际上,即使您正在进行开发而不是研究,无论如何您都会遇到一些问题 - 几乎不可能预测您最终可能对该特定代码使用的优化级别。

就我个人而言,我通常将 -O2 与 gcc 一起使用。我的一般经验法则是使用打开自动内联的最低级别的优化。我编写了很多代码,期望编译器能够内联小函数,并专门编写代码来帮助实现这一点(例如,经常使用函子而不是函数)。如果编译器没有设置为内联生成代码,那么您就无法得到我真正想要的东西。以这种方式编译的代码的性能实际上并没有任何意义——我当然不会计划真正以这种方式使用它。

Such a comparison is less about fairness than producing useful information. You should use the optimization level that you plan to use when/if the code is put into production use. If you're basically doing research, so you don't personally plan to put it into production use, you're stuck with the slightly more difficult job of guessing what somebody who would put it into production would probably do.

Realistically, even if you are doing development, not research, you're stuck with a little of that anyway -- it's nearly impossible to predict what optimization level you might eventually use with this particular code.

Personally, I usually use -O2 with gcc. My general rule of thumb is to use the lowest level of optimization that turns on automatic inlining. I write a lot of my code with the expectation that small functions will be inlined by the compiler -- and write the code specifically to assist in doing that (e.g. often using functors instead of functions). If the compiler isn't set to produce code for those inline, you're not getting what I really intended. The performance of the code when it's compiled that way doesn't really mean anything -- I certainly would not plan on ever really using it that way.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文