不同的优化级别会导致功能不同的代码吗?
我对编译器在优化时拥有的自由感到好奇。让我们将这个问题限制在 GCC 和 C/C++(任何版本、任何风格的标准)上:
是否可以编写根据编译时的优化级别而表现出不同行为的代码?
我想到的例子是在 C++ 的各种构造函数中打印不同的文本位,并根据副本是否被省略而获得差异(尽管我无法使这样的事情发挥作用)。
不允许计算时钟周期。如果您有非 GCC 编译器的示例,我也会很好奇,但我无法检查它。 C 语言示例的加分项。:-)
编辑: 示例代码应该符合标准,并且从一开始就不包含未定义的行为。
编辑2:已经得到了一些很好的答案!让我提高一点赌注:代码必须构成格式良好的程序并且符合标准,并且它必须在每个优化级别编译为正确的、确定性的程序。 (这不包括格式不正确的多线程代码中的竞争条件之类的内容。)我也意识到浮点舍入可能会受到影响,但让我们忽略这一点。
我刚刚达到 800 声望,所以我想我应该在第一个完整的例子上奖励 50 声望,以符合这些条件的(精神); 25 如果涉及滥用严格别名。 (取决于有人向我展示如何向其他人发送赏金。)
I am curious about the liberties that a compiler has when optimizing. Let's limit this question to GCC and C/C++ (any version, any flavour of standard):
Is it possible to write code which behaves differently depending on which optimization level it was compiled with?
The example I have in mind is printing different bits of text in various constructors in C++ and getting a difference depending on whether copies are elided (though I've not been able to make such a thing work).
Counting clock cycles is not permitted. If you have an example for a non-GCC compiler, I'd be curious, too, but I can't check it. Bonus points for an example in C. :-)
Edit: The example code should be standard compliant and not contain undefined behaviour from the outset.
Edit 2: Got some great answers already! Let me up the stakes a bit: The code must constitute a well-formed program and be standards-compliant, and it must compile to correct, deterministic programs in every optimization level. (That excludes things like race-conditions in ill-formed multithreaded code.) Also I appreciate that floating point rounding may be affected, but let's discount that.
I just hit 800 reputation, so I think I shall blow 50 reputation as bounty on the first complete example to conform to (the spirit) of those conditions; 25 if it involves abusing strict aliasing. (Subject to someone showing me how to send bounty to someone else.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(14)
适用的 C++ 标准部分是第 1.9 节“程序执行”。其部分内容如下:
所以,是的,代码在不同的优化级别表现可能不同,但是(假设所有级别都产生一致的编译器),但它们的表现不能明显不同。
编辑:请允许我纠正我的结论:是的,只要每个行为与标准抽象机的行为之一明显相同,代码在不同的优化级别上可能会有不同的行为。
The portion of the C++ standard that applies is §1.9 "Program execution". It reads, in part:
So, yes, code may behave differently at different optimization levels, but (assuming that all levels produce a conforming compiler), but they cannot behave observably differently.
EDIT: Allow me to correct my conclusion: Yes, code may behave differently at different optimization levels as long as each behavior is observably identical to one of the behaviors of the standard's abstract machine.
浮点计算是产生差异的一个成熟来源。根据各个操作的排序方式,您可以获得更多/更少的舍入误差。
不安全的多线程代码也可能会产生不同的结果,具体取决于内存访问的优化方式,但这本质上是代码中的错误。
正如您所提到的,当优化级别发生变化时,复制构造函数中的副作用可能会消失。
Floating point calculations are a ripe source for differences. Depending on how the individual operations are ordered, you can get more/less rounding errors.
Less than safe multi-threaded code can also have different results depending on how memory accesses are optimized, but that's essentially a bug in your code anyhow.
And as you mentioned, side effects in copy constructors can vanish when optimization levels change.
仅当您触发编译器的错误时。
编辑
此示例在 gcc 4.5.2 上的行为有所不同:
使用
-O0
编译会导致程序因分段错误而崩溃。使用
-O2
编译会创建一个进入无限循环的程序。Only if you trigger a compiler's bug.
EDIT
This example behaves differently on gcc 4.5.2:
Compiled with
-O0
creates a program crashing with a segmentation fault.Compiled with
-O2
creates a program entering an endless loop.好吧,我通过提供一个具体的例子来明目张胆地争取赏金。我将把其他人的答案和我的评论放在一起。
为了不同优化级别的不同行为,“优化级别 A”应表示 gcc -O0(我使用的是 4.3.4 版本,但这并不重要,我认为任何甚至隐约最近的版本将显示我所追求的差异),“优化级别 B”应表示 gcc -O0 -fno-elide-constructors 。
代码很简单:
优化级别 A 的输出:
优化级别 B 的输出:
代码完全合法,但由于复制构造函数省略,输出依赖于实现,特别是它对禁用复制构造函数省略的 gcc 优化标志敏感。
请注意,一般来说,“优化”是指可以改变未定义、未指定或实现定义的行为的编译器转换,但不能改变标准定义的行为。因此,满足您的标准的任何示例都必然是其输出未指定或实现定义的程序。在这种情况下,标准未指定是否删除复制因子,我只是很幸运,GCC 在允许的情况下可靠地删除了它们,但有一个选项可以禁用它。
OK, my flagrant play for the bounty, by providing a concrete example. I'll put together the bits from other people's answers and my comments.
For the purpose of different behaviour at different optimizations levels, "optimization level A" shall denote
gcc -O0
(I'm using version 4.3.4, but it doesn't matter much, I think any even vaguely recent version will show the difference I'm after), and "optimization level B" shall denotegcc -O0 -fno-elide-constructors
.Code is simple:
Output at optimization level A:
Output at optimization level B:
The code is totally legal, but the output is implementation-dependent because of copy constructor elision, and in particular it's sensitive to gcc's optimization flag that disables copy ctor elision.
Note that generally speaking, "optimization" refers to compiler transformations that can alter behavior that is undefined, unspecified or implementation-defined, but not behavior that is defined by the standard. So any example that satisfies your criteria necessarily is a program whose output is either unspecified or implementation-defined. In this case it's unspecified by the standard whether copy ctors are elided, I just happen to be lucky that GCC reliably elides them pretty much whenever allowed, but has an option to disable that.
对于C来说,几乎所有操作都在抽象机中严格定义,并且只有当可观察的结果恰好是该抽象机的结果时才允许优化。我想到了该规则的例外情况:
不同编译器之间一致
错误代码
函数调用的不同舍入
任何顺序表达式进行计算
volatile
限定的类型可能会或可能不会被评估
由于其副作用,
const
限定复合文字可能会也可能不会折叠到一个静态内存位置For C, almost all operations are strictly defined in the abstract machine and optimizations are only allowed if the observable result is exactly that of that abstract machine. Exceptions of that rule that come to mind:
consistent between different compiler
runs or executions of the faulty code
different rounding
evaluated in any order
volatile
qualifiedtype may or may not be evaluated just
for their side effects
const
qualified compound literals may or may be not folded into one static memory location根据标准,任何未定义行为都可以根据优化级别(或月相)改变其行为。
Anything that is Undefined Behavior according to the standard can change its behavior depending on optimization level (or moon-phase, for that matter).
由于复制构造函数调用可以被优化掉,即使它们有副作用,因此具有副作用的复制构造函数将导致未优化和优化的代码表现不同。
Since copy constructor calls can be optimized away, even if they have side effects, having copy constructors with side-effects will cause unoptimized and optimized code to behave differently.
如果您有两个指向同一内存块的指针,则
-fstrict-aliasing
选项很容易导致行为发生变化。这应该是无效的,但实际上很常见。The
-fstrict-aliasing
option can easily cause changes in behavior if you have two pointers to the same block of memory. This is supposed to be invalid but is actually quite common.此 C 程序调用未定义的行为,但在不同的优化级别中显示不同的结果:
This C program invokes undefined behavior, but does display different results in different optimization levels:
当使用非零优化级别时,gcc 定义 __OPTIMIZE__ 宏。您可以像下面这样使用它:
gcc defines
__OPTIMIZE__
macro when non-zero optimization level is used. You can use it like below:相同的源代码,例如
启用 -finline-small-functions 之前和启用 -finline-small-functions 之后
-finline-small-functions 可以在 -O2/-O3 中启用
same source code like
before enable -finline-small-functions and after enable -finline-small-functions
-finline-small-functions can be enabled in -O2/-O3
两个不同的 C 程序:
foo6.c
bar6.c
当两个模块都编译成一个可执行文件时
优化级别一和零,它们打印出两个不同的值。 -O1 为 0x48,-O0 为 0x55
终端屏幕截图
这是它在以下环境中工作的示例我的环境
Two different C programs:
foo6.c
bar6.c
When both modules are compiled into one excecutable with
optimization levels one and zero, they print out two different values. 0x48 for -O1 and 0x55 for -O0
Screenshot of terminal
Here is an example of it working in my environment
ac:
bc:
输出取决于是否启用或禁用合并字符串常量优化:
a.c:
b.c:
Output depends on whether merge string constants optimization is enabled or disabled:
今天我的操作系统课程中有一些有趣的例子。
我们分析了一些软件互斥体,这些互斥体在优化时可能会被损坏,因为编译器不知道并行执行。
编译器可以对不操作依赖数据的语句进行重新排序。
正如我已经在并行代码中声明的那样,这种依赖关系对于编译器来说是隐藏的,因此它可能会中断。
我给出的示例会导致调试过程中出现一些困难,因为线程安全性被破坏,并且由于操作系统调度问题和并发访问错误,代码的行为变得不可预测。
Got some interesting example in my OS course today.
We analized some software mutex that could be damaged on optimization because the compiler does not know about the parallel execution.
The compiler can reorder statements that do not operate on dependent data.
As I already statet in parallelized code this dependencie is hidden for the compiler so it could break.
The example I gave would lead to some hard times in debugging as the threadsafety is broken and your code behaves unpredictable because of OS-scheduling issues and concurrent access errors.