为什么 gcc 不删除对非易失性变量的检查？

发布于 2024-08-27 02:54:01 字数 1139 浏览 8 评论 0原文

这个问题主要是学术性的。我出于好奇而问，并不是因为这给我带来了实际问题。

考虑以下不正确的 C 程序。

#include <signal.h>
#include <stdio.h>

static int running = 1;

void handler(int u) {
    running = 0;
}

int main() {
    signal(SIGTERM, handler);
    while (running)
        ;
    printf("Bye!\n");
    return 0;
}

该程序不正确，因为处理程序中断了程序流程，因此可以随时修改running，因此应将其声明为易失性。但假设程序员忘记了这一点。

gcc 4.3.3 带有 -O3 标志，将循环体（在对 running 标志进行一次初始检查之后）编译为

.L7:
        jmp     .L7

预期的无限循环。

现在，我们在 while 循环中放入一些琐碎的内容，例如：

    while (running)
        putchar('.');

突然间，gcc 不再优化循环条件了！循环体的程序集现在看起来像这样（同样在 -O3 处）：

.L7:
        movq    stdout(%rip), %rsi
        movl    $46, %edi
        call    _IO_putc
        movl    running(%rip), %eax
        testl   %eax, %eax
        jne     .L7

我们看到 running 每次通过循环都会从内存中重新加载；它甚至没有缓存在寄存器中。显然 gcc 现在认为 running 的值可能已经改变。

那么为什么在这种情况下 gcc 会突然决定需要重新检查 running 的值呢？

原文

This question is mostly academic. I ask out of curiosity, not because this poses an actual problem for me.

Consider the following incorrect C program.

#include <signal.h>
#include <stdio.h>

static int running = 1;

void handler(int u) {
    running = 0;
}

int main() {
    signal(SIGTERM, handler);
    while (running)
        ;
    printf("Bye!\n");
    return 0;
}

This program is incorrect because the handler interrupts the program flow, so running can be modified at any time and should therefore be declared volatile. But let's say the programmer forgot that.

gcc 4.3.3, with the -O3 flag, compiles the loop body (after one initial check of the running flag) down to the infinite loop

.L7:
        jmp     .L7

which was to be expected.

Now we put something trivial inside the while loop, like:

    while (running)
        putchar('.');

And suddenly, gcc does not optimize the loop condition anymore! The loop body's assembly now looks like this (again at -O3):

.L7:
        movq    stdout(%rip), %rsi
        movl    $46, %edi
        call    _IO_putc
        movl    running(%rip), %eax
        testl   %eax, %eax
        jne     .L7

We see that running is re-loaded from memory each time through the loop; it is not even cached in a register. Apparently gcc now thinks that the value of running could have changed.

So why does gcc suddenly decide that it needs to re-check the value of running in this case?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

静若繁花 2024-09-03 02:54:01

在一般情况下，编译器很难准确地知道函数可能有权访问哪些对象，因此可能会修改哪些对象。在调用 putchar() 时，GCC 不知道是否有一个 putchar() 实现可以修改 running 因此它必须有点悲观，并假设 running 实际上可能已被更改。

例如，翻译单元中稍后可能有一个 putchar() 实现：

int putchar( int c)
{
    running = c;
    return c;
}

即使翻译单元中没有 putchar() 实现，也可能存在以下内容：例如，可能会传递 running 对象的地址，以便 putchar 可以修改它：

void foo(void)
{
    set_putchar_status_location( &running);
}

请注意，您的 handler() 函数是全局可访问的，因此 putchar() 可能会调用 handler() 本身（直接或以其他方式），这是上述情况的一个实例。

<罢工>
另一方面，由于 running 仅对翻译单元可见（static），因此当编译器到达文件末尾时，它应该能够确定 putchar() 没有机会访问它（假设是这种情况），编译器可以返回并“修复” while 循环中的悲观化。

由于 running 是静态的，编译器可能能够确定它无法从翻译单元外部访问，并进行您正在讨论的优化。但是，由于它可以通过 handler() 访问，并且 handler() 可以从外部访问，因此编译器无法优化访问。即使您将 handler() 设为静态，它也可以从外部访问，因为您将其地址传递给另一个函数。

请注意，在您的第一个示例中，即使我在上一段中提到的内容仍然正确，编译器也可以优化对 running 的访问，因为 C 语言所基于的“抽象机器模型”不会这样做。除非在非常有限的情况下，否则不要考虑异步活动（其中一个是 volatile 关键字，另一个是信号处理，尽管信号处理的要求不足以阻止编译器能够优化第一个示例中对 running 的访问）。

事实上，C99 在几乎这些具体情况下描述了抽象机器行为：

5.1.2.3/8“程序执行”
示例 1：
实现可能会定义抽象语义和实际语义之间的一对一对应关系：在每个序列点，实际对象的值将与抽象语义指定的值一致。那么关键字易失性将是多余的。
或者，实现可以在每个翻译单元内执行各种优化，以便仅当跨翻译单元边界进行函数调用时实际语义才与抽象语义一致。在这样的实现中，在调用函数和被调用函数处于不同翻译单元的每个函数进入和函数返回时，所有外部链接对象和通过其中的指针可访问的所有对象的值将与抽象语义一致。此外，在每个这样的函数进入时，被调用函数的参数值以及通过其中的指针可访问的所有对象的参数值将与抽象语义一致。在这种类型的实现中，由信号函数激活的中断服务例程引用的对象将需要易失性存储的显式规范，以及其他实现定义的限制。

最后，您应该注意 C99 标准还规定：

7.14.1.1/5 “信号函数`
如果信号不是由于调用 abort 或 raise 函数而发生的，并且信号处理程序引用任何具有静态存储持续时间的对象，则行为未定义除了将值分配给声明为 volatile sig_atomic_t...
的对象之外

所以严格来说 running 变量可能需要声明为：

volatile sig_atomic_t running = 1;

In the general case it's difficult for a compiler to know exactly which objects a function might have access to and therefore could potentially modify. At the point where putchar() is called, GCC doesn't know if there might be a putchar() implementation that might be able to modify running so it has to be somewhat pessimistic and assume that running might in fact have been changed.

For example, there might be a putchar() implementation later in the translation unit:

int putchar( int c)
{
    running = c;
    return c;
}

Even if there's not a putchar() implementation in the translation unit, there could be something that might, for example, pass the address of the running object such that putchar might be able to modify it:

void foo(void)
{
    set_putchar_status_location( &running);
}

Note that your handler() function is globally accessible, so putchar() might call handler() itself (directly or otherwise), which is an instance of the above situation.

On the other hand, since running is visible only to the translational unit (being static), by the time the compiler gets to the end of the file it should be able to determine that there is no opportunity for putchar() to access it (assuming that's the case), and the compiler could go back and 'fix up' the pessimization in the while loop.

Since running is static, the compiler might be able to determine that it's not accessible from outside the translation unit and make the optimization you're talking about. However, since it's accessible through handler() and handler() is accessible externally, the compiler can't optimize the access away. Even if you make handler() static, it's accessible externally since you pass the address of it to another function.

Note that in your first example, even though what I mentioned in the above paragraph is still true the compiler can optimize away the access to running because the 'abstract machine model' the C language is based on doesn't take into account asynchronous activity except in very limited circumstances (one of which is the volatile keyword and another is signal handling, though the requirements of the signal handling aren't strong enough to prevent the compiler being able to optimize away the access to running in your first example).

In fact, here's something the C99 says about the abstract machine behavior in pretty much these exact circumstances:

5.1.2.3/8 "Program execution"
EXAMPLE 1:
An implementation might define a one-to-one correspondence between abstract and actual semantics: at every sequence point, the values of the actual objects would agree with those specified by the abstract semantics. The keyword volatile would then be redundant.
Alternatively, an implementation might perform various optimizations within each translation unit, such that the actual semantics would agree with the abstract semantics only when making function calls across translation unit boundaries. In such an implementation, at the time of each function entry and function return where the calling function and the called function are in different translation units, the values of all externally linked objects and of all objects accessible via pointers therein would agree with the abstract semantics. Furthermore, at the time of each such function entry the values of the parameters of the called function and of all objects accessible via pointers therein would agree with the abstract semantics. In this type of implementation, objects referred to by interrupt service routines activated by the signal function would require explicit specification of volatile storage, as well as other implementation defined restrictions.

Finally, you should note that the C99 standard also says:

7.14.1.1/5 "The signal function`
If the signal occurs other than as the result of calling the abort or raise function, the behavior is undefined if the signal handler refers to any object with static storage duration other than by assigning a value to an object declared as volatile sig_atomic_t...

So strictly speaking the running variable may need to be declared as:

volatile sig_atomic_t running = 1;

回复收藏 0 原文

夜灵血窟げ 2024-09-03 02:54:01

因为调用putchar()可能会改变running的值（GCC只知道putchar()是外部函数而不知道它的作用 - 对于所有 GCC 都知道 putchar() 可以调用 handler())。

回复收藏 0 原文

我不咬妳我踢妳 2024-09-03 02:54:01

GCC 可能假设对 putchar 的调用可以修改任何全局变量，包括 running。

看一下 pure 函数属性，声明该函数对全局状态没有副作用。我怀疑如果您用对“纯”函数的调用替换 putchar()，GCC 将重新引入循环优化。

回复收藏 0 原文

坠似风落 2024-09-03 02:54:01

谢谢大家的回答和评论。他们非常有帮助，但没有一个提供完整的故事。 [编辑：迈克尔·伯尔的回答现在确实如此，这使得这有点多余。]我将在这里总结。

即使 running 是静态的，handler 也不是静态的；因此它可能会从 putchar 调用并以这种方式更改running。由于此时 putchar 的实现尚不清楚，因此可以想象它可以从 while 循环体调用 handler。

假设handler是静态的。那么我们可以优化掉running 检查吗？答案是否定的，因为 signal 实现也在这个编译单元之外。据 gcc 所知，signal 可能会将 handle 的地址存储在某处（事实上，它确实如此），然后 putchar 可能会调用< code>handler 通过此指针，即使它无法直接访问该函数。

那么在什么情况下可以优化掉运行检查呢？似乎只有当循环体不从该翻译单元外部调用任何函数时，这才是可能的，以便在编译时知道循环体内部发生和不发生的情况。

这解释了为什么忘记 易失性 在实践中并不像乍看起来那么大。