为什么不总是使用编译器优化?

发布于 2024-12-11 02:36:52 字数 337 浏览 0 评论 0原文

我前段时间问的问题之一有未定义的行为,所以编译器优化实际上导致了程序的崩溃。

但是,如果代码中没有未定义的行为,那么是否有理由不使用编译器优化呢?我知道有时出于调试目的,人们可能不需要优化代码(如果我错了,请纠正我)。除此之外,在生产代码上,为什么不总是使用编译器优化呢?

另外,是否有理由使用 -O 而不是 -O2-O3

One of the questions that I asked some time ago had undefined behavior, so compiler optimization was actually causing the program to break.

But if there is no undefined behavior in you code, then is there ever a reason not to use compiler optimization? I understand that sometimes, for debugging purposes, one might not want optimized code (please correct me if I am wrong). Other than that, on production code, why not always use compiler optimization?

Also, is there ever a reason to use, say, -O instead of -O2 or -O3?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

怎会甘心 2024-12-18 02:36:52

如果没有未定义的行为,但存在明确的损坏行为(要么是确定性的正常错误,要么是不确定的竞争条件),关闭优化是值得的,这样您就可以使用调试器单步调试代码。

通常,当我达到这种状态时,我喜欢执行以下操作的组合:

  1. 调试构建(无优化)并逐步执行代码
  2. 散布诊断语句到 stderr,以便我可以轻松追踪运行路径

如果 bug 更加狡猾,我会取出 valgrinddrd,并根据需要添加 unit-tests,两者都是为了隔离问题并确保在发现问题时,解决方案按预期工作。

在某些极少数情况下,调试代码可以工作,但发布代码会失败。当这种情况发生时,几乎总是问题出在我的代码中;发布版本中的积极优化可能会揭示由于误解临时生命周期等而导致的错误......但即使在这种情况下,调试版本也有助于隔离问题。

简而言之,专业开发人员构建和测试调试(非优化)和发布(优化)二进制文件有一些很好的理由。恕我直言,让调试和发布版本始终通过单元测试将为您节省大量调试时间。

If there is no undefined behavior, but there is definite broken behavior (either deterministic normal bugs, or indeterminate like race-conditions), it pays to turn off optimization so you can step through your code with a debugger.

Typically, when I reach this kind of state, I like to do a combination of:

  1. debug build (no optimizations) and step through the code
  2. sprinkled diagnostic statements to stderr so I can easily trace the run path

If the bug is more devious, I pull out valgrind and drd, and add unit-tests as needed, both to isolate the problem and ensure that to when the problem is found, the solution works as expected.

In some extremely rare cases, the debug code works, but the release code fails. When this happens, almost always, the problem is in my code; aggressive optimization in release builds can reveal bugs caused by mis-understood lifetimes of temporaries, etc... ...but even in this kind of situation, having a debug build helps to isolate the issues.

In short, there are some very good reasons why professional developers build and test both debug (non-optimized) and release (optimized) binaries. IMHO, having both debug and release builds pass unit-tests at all times will save you a lot of debugging time.

疧_╮線 2024-12-18 02:36:52

编译器优化有两个缺点:

  1. 优化几乎总是会重新排列和/或删除代码。这将降低调试器的效率,因为源代码和生成的代码之间将不再存在一一对应的关系。堆栈的某些部分可能会丢失,单步执行指令可能最终会以违反直觉的方式跳过部分代码。
  2. 优化的执行成本通常很高,因此在打开优化的情况下编译代码将比其他情况下花费更长的时间。在编译代码时很难做任何富有成效的事情,因此显然较短的编译时间是一件好事。

-O3 执行的一些优化可能会产生更大的可执行文件。这在某些生产代码中可能并不理想。

不使用优化的另一个原因是您使用的编译器可能包含仅在执行优化时才存在的错误。不进行优化的编译可以避免这些错误。如果您的编译器确实包含错误,更好的选择可能是报告/修复这些错误,更改为更好的编译器,或者编写完全避免这些错误的代码。

如果您希望能够对已发布的生产代码进行调试,那么不优化代码可能也是一个好主意。

Compiler optimisations have two disadvantages:

  1. Optimisations will almost always rearrange and/or remove code. This will reduce the effectiveness of debuggers, because there will no longer be a 1 to 1 correspondence between your source code and the generated code. Parts of the stack may be missing, and stepping through instructions may end up skipping over parts of the code in counterintuitive ways.
  2. Optimisation is usually expensive to perform, so your code will take longer to compile with optimisations turned on than otherwise. It is difficult to do anything productive while your code is compiling, so obviously shorter compile times are a good thing.

Some of the optimisations performed by -O3 can result in larger executables. This might not be desirable in some production code.

Another reason to not use optimisations is that the compiler that you are using may contain bugs that only exist when it is performing optimisation. Compiling without optimisation can avoid those bugs. If your compiler does contain bugs, a better option might be to report/fix those bugs, to change to a better compiler, or to write code that avoids those bugs completely.

If you want to be able to perform debugging on the released production code, then it might also be a good idea to not optimise the code.

×眷恋的温暖 2024-12-18 02:36:52

3 个原因

  1. 它会使调试器感到困惑,有时
  2. 它与某些代码模式不兼容
  3. 不值得:速度慢或有错误,或占用太多内存,或生成太大的代码。

在情况 2 中,想象一些故意更改指针类型的操作系统代码。优化器可以假设无法引用错误类型的对象,并生成为寄存器中更改内存值别名的代码,并得到“错误的”1答案。

案例3是一个有趣的问题。有时优化器会使代码变小,但有时又会使代码变大。大多数程序都不受 CPU 限制,即使是那些受 CPU 限制的程序,也只有 10% 或更少的代码实际上是计算密集型的。如果优化器有任何缺点,那么它只是对程序中不到 10% 的部分有利。

如果生成的代码较大,那么它对缓存的友好性就会降低。对于在微小循环中具有 O(n3) 算法的矩阵代数库来说,这可能是值得的。但对于具有更典型时间复杂度的东西,溢出缓存实际上可能会使程序变慢。通常,优化器可以针对所有这些内容进行调整,但如果程序是一个 Web 应用程序,那么如果编译器只做通用的事情并允许开发人员不打开应用程序,那么它肯定会对开发人员更加友好。 fancy-tricks 潘多拉魔盒。


1.此类程序通常不符合标准,因此优化器在技术上是“正确的”,但仍然没有达到开发人员的预期。

3 Reasons

  1. It confuses the debugger, sometimes
  2. It's incompatible with some code patterns
  3. Not worth it: slow or buggy, or takes too much memory, or produces code that's too big.

In case 2, imagine some OS code that deliberately changes pointer types. The optimizer can assume that objects of the wrong type could not be referenced and generate code that aliases changing memory values in registers and gets the "wrong"1 answer.

Case 3 is an interesting concern. Sometimes optimizers make code smaller but sometimes they make it bigger. Most programs are not the least bit CPU-bound and even for the ones that are, only 10% or less of the code is actually computationally-intensive. If there is any downside at all to the optimizer then it is only a win for less than 10% of a program.

If the generated code is larger, then it will be less cache-friendly. This might be worth it for a matrix algebra library with O(n3) algorithms in tiny little loops. But for something with more typical time complexity, overflowing the cache might actually make the program slower. Optimizers can be tuned for all this stuff, typically, but if the program is a web application, say, it would certainly be more developer-friendly if the compiler would just do the all-purpose things and allow the developer to just not open the fancy-tricks Pandora's box.


1. Such programs are usually not standard-conforming so the optimizer is technically "correct", but still not doing what the developer intended.

怂人 2024-12-18 02:36:52

原因是您开发一个应用程序(调试版本),而您的客户运行完全不同的应用程序(发布版本)。如果测试资源较低和/或使用的编译器不是很受欢迎,我将禁用发布版本的优化。

MS 针对 MSVC x86 编译器中的优化错误发布了许多修补程序。幸运的是,我在现实生活中从未遇到过。但其他编译器的情况并非如此。 MS Embedded Visual C++ 中的 SH4 编译器有很多错误。

The reason is that you develop one application (debug build) and your customers run completely different application (release build). If testing resources are low and/or compiler used is not very popular, I would disable optimization for release builds.

MS publishes numerous hotfixes for optimization bugs in their MSVC x86 compiler. Fortunately, I've never encountered one in real life. But this was not the case with other compilers. SH4 compiler in MS Embedded Visual C++ was very buggy.

惟欲睡 2024-12-18 02:36:52

我发现浮点数学和过于激进的内联是两个重要原因。前者是由于 C++ 标准对浮点数学的定义极差造成的。例如,许多处理器使用 80 位精度执行计算,只有在将值放回主内存时才会降至 64 位。如果例程的一个版本经常将该值刷新到内存中,而另一个例程仅在最后获取该值一次,则计算结果可能会略有不同。仅仅调整该例程的优化很可能比重构代码以对差异更加鲁棒更好。

内联可能会产生问题,因为就其本质而言,它通常会导致更大的目标文件。也许出于实际原因,这种代码大小的增加是不可接受的:例如,它需要适合内存有限的设备。或者,代码大小的增加可能会导致代码变慢。如果例程变得足够大而不再适合缓存,则由此产生的缓存未命中可能很快就会超过最初提供的内联的好处。

我经常听说有人在多线程环境中工作时关闭调试,然后由于新发现的竞争条件等原因立即遇到大量新错误。不过,优化器只是在这里揭示了底层的错误代码,因此关闭它作为响应可能是不明智的。

Two big reasons that I have seen arise from floating point math, and overly aggressive inlining. The former is caused by the fact that floating point math is extremely poorly defined by the C++ standard. Many processors perform calculations using 80-bits of precision, for instance, only dropping down to 64-bits when the value is put back into main memory. If a version of a routine flushes that value to memory frequently, while another only grabs the value once at the end, the results of the calculations can be slightly different. Just tweaking the optimizations for that routine may well be a better move than refactoring the code to be more robust to the differences.

Inlining can be problematic because, by its very nature, it generally results in larger object files. Perhaps this increase is code size is unacceptable for practical reasons: it needs to fit on a device with limited memory, for instance. Or perhaps the increase in code size results in the code being slower. If it a routine becomes big enough that it no longer fits in cache, the resultant cache misses can quickly outweigh the benefits inlining provided in the first place.

I frequently hear of people who, when working in a multi-threaded environment, turn off debugging and immediately encounter hordes of new bugs due to newly uncovered race conditions and whatnot. The optimizer just revealed the underlying buggy code here, though, so turning it off in response is probably ill advised.

彩虹直至黑白 2024-12-18 02:36:52

刚刚发生在我身上。 swig 生成的用于连接 Java 的代码是正确的,但无法与 gcc 上的 -O2 一起使用。

Just happened to me. The code generated by swig for interfacing Java is correct but won't work with -O2 on gcc.

爱殇璃 2024-12-18 02:36:52

有一个例子,为什么使用优化标志有时是危险的,我们的测试应该覆盖大部分代码以注意到这样的错误。

使用clang(因为在gcc中即使没有优化标志,也会进行一些iptimizations并且输出被损坏):

文件:a.cpp

#include <stdio.h>

int puts(const char *str) {
    fputs("Hello, world!\n", stdout);
    return 1;
}

int main() {
    printf("Goodbye!\n");
    return 0;
}

没有-Ox标志:

> clang --output withoutOptimization a.cpp; ./没有优化

> 再见!

使用 -Ox 标志:

> clang --output withO1 -O1 a.cpp; ./withO1

> 你好,世界!

There is an example, why sometimes is dangerous using optimization flag and our tests should cover most of the code to notice such an error.

Using clang (because in gcc even without optimization flag, makes some iptimizations and the output is corrupted):

File: a.cpp

#include <stdio.h>

int puts(const char *str) {
    fputs("Hello, world!\n", stdout);
    return 1;
}

int main() {
    printf("Goodbye!\n");
    return 0;
}

Without -Ox flag:

> clang --output withoutOptimization a.cpp; ./withoutOptimization

> Goodbye!

With -Ox flag:

> clang --output withO1 -O1 a.cpp; ./withO1

> Hello, world!

随遇而安 2024-12-18 02:36:52

简单的。编译器优化错误。

Simple. Compiler optimization bugs.

三生一梦 2024-12-18 02:36:52

基于程序不会执行 X 的想法的优化在处理不涉及执行 X 的任务时会很有用,但在执行最好通过执行 X 来完成的任务时最多会适得其反。 。

由于 C 语言有多种用途,因此该标准故意允许专为特殊目的而设计的编译器对程序行为做出假设,从而使它们不适用于许多其他用途 标准的作者允许实现通过指定它们在标准没有强加要求的情况下的行为方式来扩展语言的语义,并期望高质量的实现会在客户发现它有用的情况下寻求这样做,而无需考虑标准是否要求他们这样做。

需要执行标准未预期或容纳的任务的程序通常需要利用其行为由许多实现定义但不是标准强制要求的结构。此类程序并不是“损坏”的,而只是用标准不要求所有实现都支持的方言编写的。

作为示例,请考虑以下函数 test 以及它是否满足以下条件行为要求:

  1. 如果传递的值的底部 16 位将与 17 的某个幂的值匹配,则返回该 17 的幂的底部 32 位。

  2. 在任何情况下都不要写入 arr[65536]。

该代码看起来显然应该满足第二个要求,但是可以依赖它来这样做吗?

#include <stdint.h>
int arr[65537];
uint32_t doSomething(uint32_t x)
{
    uint32_t i=1;
    while ((uint16_t)i != x)
        i*=17;
    if (x < 65536)
        arr[x] = 1;
    return i;
}
void test(uint32_t x)
{
    doSomething(x);
}

如果以非零优化级别将代码馈送到 clang,则如果 x 为 65536,则为 test 生成的机器代码将无法满足第二个要求,因为生成的代码将相当于简单的arr[x] = 1;。即使在 -O1 下,Clang 也会执行此“优化”,除了强制 C89 或 C99 模式之外,限制损坏优化的正常选项都不会阻止它。

An optimization that is predicated on the idea that a program won't do X will be useful when processing tasks that don't involve doing X, but will be at best counter-productive when performing a task which could be best accomplished by doing X.

Because the C language is used for many purposes, the Standard deliberately allows compilers which are designed for specialized purposes to make assumptions about program behavior that would render them unsuitable for many other purposes. The authors of the Standard allowed implementations to extend the semantics of the language by specifying how they will behave in situations where the Standard imposes no requirements, and expected that quality implementations would seek to do so in cases where their customers would find it useful, without regard for whether the Standard required them to do so.

Programs that need to perform tasks not anticipated or accommodated by the Standard will often need to exploit constructs whose behavior is defined by many implementations, but not mandated by the Standard. Such programs are not "broken", but are merely written in a dialect that the Standard doesn't require that all implementations support.\

As an example, consider the following function test and whether it satisfies the following behavioral requirements:

  1. If passed a value whose bottom 16 bits would match those of some power of 17, return the bottom 32 bits of that power of 17.

  2. Do not write to arr[65536] under any circumstances.

The code would appear like it should obviously meet the second requirement, but can it be relied upon to do so?

#include <stdint.h>
int arr[65537];
uint32_t doSomething(uint32_t x)
{
    uint32_t i=1;
    while ((uint16_t)i != x)
        i*=17;
    if (x < 65536)
        arr[x] = 1;
    return i;
}
void test(uint32_t x)
{
    doSomething(x);
}

If the code is fed to clang with a non-zero optimization level, the generated machine code for test will fail the second requirement if x is 65536, since the generated code will be equivalent to simply arr[x] = 1;. Clang will perform this "optimization" even at -O1, and none of the normal options to limit broken optimizations will prevent it other than those which force C89 or C99 mode.

☆獨立☆ 2024-12-18 02:36:52

就我个人而言,我将我的代码库分为三类。
A) 预先存在的代码、可信来源、信誉良好
B)我开发了它,正在开发中
C) 我开发、完成并测试了它。

还有
D) 预先存在的代码、不受信任的来源或信誉不佳或有错误的代码,

我通常会避免这样做。

A)我总是进行优化编译。如果我的应用程序之一链接到 postgres,或者更准确地说,libpq.so.x,客户端库 - 我克隆一个稳定版本,而不是主分支上的任何内容。此时 Postgres 的最后一个稳定版本是 16.4,所以这就是我所掌握的。我对标志中此类代码的默认设置是 :

-flto -O2 -march=znver4 -fuse-ld=lld -stdlib=libc++ 

并拉

-lunwind

入链接器标志。如果出现问题,我肯定不会深入研究这个库来找出错误是什么,但我会在 git 上报告它。

B)我总是使用完整的调试符号来编译那些,而不进行优化。原因很明显:编译的应用程序和源代码之间的单步调试应该是 1:1 - 任何优化都会阻止您这样做。

C) 一旦我确定我的应用程序没有错误并且已经过测试,我会尝试启用 A) 中的所有优化,并对其进行与调试版本所经历的相同的单元测试。一旦我确定它成立,我就会删除 March 标志并将性能与 March 版本进行比较,并基于此,我决定是否根据部署环境的体系结构编译不同的版本,或者将 March 设置为某些内容像 x86-64-v3。

更准确地回答您的问题:如果您 100% 确定此代码有效,则可以使用优化进行编译,无论是因为您亲自测试过它,还是因为您知道它是数千人使用的主要应用程序/库的发行版本并每天编译,您可以相信它不会未经测试。

Personally, I divide my codebase into three categories.
A) Pre-existing code, trusted source, reputable
B) I developed it, in development
C) I developed it, done and tested.

There's also
D) Pre-existing code, untrusted source or not reputable or buggy,

which I generally avoid.

A) I always compile with optimizations. If one of my Apps links to postgres, or more precisely, libpq.so.x, the client library - I clone a stable release, not whatever's on the master branch. The last stable release of Postgres at this point is 16.4, so that's what I grab. My default for this type of code in flags is :

-flto -O2 -march=znver4 -fuse-ld=lld -stdlib=libc++ 

and pulling

-lunwind

into the linker flags. If something goes wrong, I definetely am not delving into whatever library this is to figure out what the error is, but I report it on git.

B) I always compile those with full debug symbols, without optimizations. The reason is obvious: stepping through the code should be 1:1 between compiled app and source - any optimizations bar you from this.

C) Once I am certain my app is bug-free and has been tested as such, I try enabling all optimizations like in A) and subject it to the same unit tests the debug version went through. Once I am certain it holds up, I remove the march flag and compare performance to the march version, and based on this, I make the call whether to compile distinct versions based on the architecture of the deployment environment, or to set march to something like x86-64-v3.

To answer your question more precisely: You compile with optimizations if you're 100% certain this code works, either because you've personally tested it, or if you know it's a release version of a major app/library which thousands of people use and compile everyday and which you can trust not to be untested.

温暖的光 2024-12-18 02:36:52

一个例子是短路布尔求值。类似于:

if (someFunc() && otherFunc()) {
  ...
}

“智能”编译器可能会意识到 someFunc 由于某种原因总是返回 false,从而使整个语句评估为 false,并决定不调用 otherFunc 以节省 CPU 时间。但是,如果 otherFunc 包含一些直接影响程序执行的代码(可能会重置全局标志或其他内容),那么它现在不会执行该步骤,并且您的程序会进入未知状态。

One example is short-circuit boolean evaluation. Something like:

if (someFunc() && otherFunc()) {
  ...
}

A 'smart' compiler might realize that someFunc will always return false for some reason, making the entire statement evaluate to false, and decide to not call otherFunc to save CPU time. But if otherFunc contains some code that directly affects program execution (maybe it resets a global flag or something), it now won't perform that step and you program enters an unknown state.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文