可以 C/C++编译器通过 pthread 库调用合法地将变量缓存在寄存器中？

发布于 2024-10-07 21:27:32 字数 2029 浏览 0 评论 0原文

假设我们有以下代码：

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

void guarantee(bool cond, const char *msg) {
    if (!cond) {
        fprintf(stderr, "%s", msg);
        exit(1);
    }
}

bool do_shutdown = false;   // Not volatile!
pthread_cond_t shutdown_cond = PTHREAD_COND_INITIALIZER;
pthread_mutex_t shutdown_cond_mutex = PTHREAD_MUTEX_INITIALIZER;

/* Called in Thread 1. Intended behavior is to block until
trigger_shutdown() is called. */
void wait_for_shutdown_signal() {

    int res;

    res = pthread_mutex_lock(&shutdown_cond_mutex);
    guarantee(res == 0, "Could not lock shutdown cond mutex");

    while (!do_shutdown) {   // while loop guards against spurious wakeups
        res = pthread_cond_wait(&shutdown_cond, &shutdown_cond_mutex);
        guarantee(res == 0, "Could not wait for shutdown cond");
    }

    res = pthread_mutex_unlock(&shutdown_cond_mutex);
    guarantee(res == 0, "Could not unlock shutdown cond mutex");
}

/* Called in Thread 2. */
void trigger_shutdown() {

    int res;

    res = pthread_mutex_lock(&shutdown_cond_mutex);
    guarantee(res == 0, "Could not lock shutdown cond mutex");

    do_shutdown = true;

    res = pthread_cond_signal(&shutdown_cond);
    guarantee(res == 0, "Could not signal shutdown cond");

    res = pthread_mutex_unlock(&shutdown_cond_mutex);
    guarantee(res == 0, "Could not unlock shutdown cond mutex");
}

符合标准的 C/C++ 编译器能否在调用 pthread_cond_wait() 时将 do_shutdown 的值缓存到寄存器中？如果不是，哪些标准/条款可以保证这一点？

编译器可以假设知道 pthread_cond_wait() 不会修改 do_shutdown。这看起来不太可能，但据我所知，没有任何标准可以阻止这种情况发生。

实际上，任何 C/C++ 编译器是否会在调用 pthread_cond_wait() 时将 do_shutdown 的值缓存在寄存器中？

哪些函数调用保证编译器不会缓存 do_shutdown 的值？很明显，如果函数是在外部声明的，并且编译器无法访问其定义，则它不能对其行为做出任何假设，因此无法证明它没有访问do_shutdown。如果编译器可以内联该函数并证明它不会访问 do_shutdown ，那么即使在多线程设置中它也可以缓存 do_shutdown 吗？同一编译单元中的非内联函数怎么样？

原文

Suppose that we have the following bit of code:

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

void guarantee(bool cond, const char *msg) {
    if (!cond) {
        fprintf(stderr, "%s", msg);
        exit(1);
    }
}

bool do_shutdown = false;   // Not volatile!
pthread_cond_t shutdown_cond = PTHREAD_COND_INITIALIZER;
pthread_mutex_t shutdown_cond_mutex = PTHREAD_MUTEX_INITIALIZER;

/* Called in Thread 1. Intended behavior is to block until
trigger_shutdown() is called. */
void wait_for_shutdown_signal() {

    int res;

    res = pthread_mutex_lock(&shutdown_cond_mutex);
    guarantee(res == 0, "Could not lock shutdown cond mutex");

    while (!do_shutdown) {   // while loop guards against spurious wakeups
        res = pthread_cond_wait(&shutdown_cond, &shutdown_cond_mutex);
        guarantee(res == 0, "Could not wait for shutdown cond");
    }

    res = pthread_mutex_unlock(&shutdown_cond_mutex);
    guarantee(res == 0, "Could not unlock shutdown cond mutex");
}

/* Called in Thread 2. */
void trigger_shutdown() {

    int res;

    res = pthread_mutex_lock(&shutdown_cond_mutex);
    guarantee(res == 0, "Could not lock shutdown cond mutex");

    do_shutdown = true;

    res = pthread_cond_signal(&shutdown_cond);
    guarantee(res == 0, "Could not signal shutdown cond");

    res = pthread_mutex_unlock(&shutdown_cond_mutex);
    guarantee(res == 0, "Could not unlock shutdown cond mutex");
}

Can a standards-compliant C/C++ compiler ever cache the value of do_shutdown in a register across the call to pthread_cond_wait()? If not, which standards/clauses guarantee this?

The compiler could hypothetically know that pthread_cond_wait() does not modify do_shutdown. This seems rather improbable, but I know of no standard that prevents it.

In practice, do any C/C++ compilers cache the value of do_shutdown in a register across the call to pthread_cond_wait()?

Which function calls is the compiler guaranteed not to cache the value of do_shutdown across? It's clear that if the function is declared externally and the compiler cannot access its definition, it must make no assumptions about its behavior so it cannot prove that it does not access do_shutdown. If the compiler can inline the function and prove it does not access do_shutdown, then can it cache do_shutdown even in a multithreaded setting? What about a non-inlined function in the same compilation unit?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

仄言 2024-10-14 21:27:32

当然，当前的 C 和 C++ 标准没有提及这个主题。

据我所知，Posix 仍然避免正式定义并发模型（不过，我可能已经过时了，在这种情况下，我的答案仅适用于早期的 Posix 版本）。因此，必须带着一点同情心来阅读它所说的内容 - 它没有精确地列出该领域的要求，但实现者应该“知道它的含义”并做一些使线程可用的事情。

当标准说互斥体“同步内存访问”时，实现必须假设这意味着在一个线程的锁下所做的更改将在其他线程的锁下可见。换句话说，同步操作包括一种或另一种内存屏障是必要的（尽管还不够），并且内存屏障的必要行为是它必须假设全局变量可以更改。

线程无法作为库实现涵盖了 pthread 实际可用所需的一些特定问题，但在撰写本文时（2004 年）Posix 标准中尚未明确说明。在允许程序员“令人信服地推理程序正确性”方面，您的编译器编写者或为您的实现定义内存模型的任何人是否同意 Boehm 的“可用”含义变得非常重要。

请注意，Posix 不保证一致的内存缓存，因此，如果您的实现反常地想要在代码的寄存器中缓存 do_something ，那么即使您将其标记为易失性，它可能会反常地选择在同步操作和读取 do_something 之间不弄脏 CPU 的本地缓存。因此，如果写入线程在具有自己的缓存的不同 CPU 上运行，即使这样您也可能看不到更改。

这就是线程不能仅作为库实现的（原因之一）。这种仅从本地 CPU 缓存获取易失性全局变量的优化在单线程 C 实现中有效[*]，但会破坏多线程代码。因此，编译器需要“了解”线程，以及它们如何影响其他语言功能（例如 pthreads 之外的示例：在 Windows 上，缓存始终是一致的，Microsoft 阐明了它授予 volatile 的附加语义在多线程代码中）。基本上，您必须假设，如果您的实现在提供 pthreads 函数方面遇到了麻烦，那么它也会在定义一个可行的内存模型（其中锁实际上同步内存访问）方面遇到麻烦。

如果编译器可以内联
函数并证明它不访问
do_shutdown，那么是否可以缓存
do_shutdown 即使在多线程中
环境？非内联的怎么样
在同一个编译单元中运行？

对所有这些都是肯定的 - 如果该对象是非易失性的，并且编译器可以证明该线程不会修改它（通过其名称或通过别名指针），并且如果没有发生内存障碍，那么它可以重用以前的值。当然，有时可能还会存在其他特定于实现的条件来阻止它。

[*] 前提是实现知道全局不位于某个“特殊”硬件地址，这要求读取始终通过缓存到达主内存，以便查看影响该地址的任何硬件操作的结果。但是要将全局变量放在任何此类位置，或者通过 DMA 或其他方式使其位置特殊，需要特定于实现的魔法。如果没有任何这样的魔法，原则上的实现有时可以知道这一点。

Of course the current C and C++ standards say nothing on the subject.

As far as I know, Posix still avoids formally defining a concurrency model (I may be out of date, though, in which case apply my answer only to earlier Posix versions). Therefore what it does say has to be read with a little sympathy - it does not precisely lay out the requirements in this area, but implementers are expected to "know what it means" and do something that makes threads usable.

When the standard says that mutexes "synchronize memory access", implementations must assume that this means changes made under the lock in one thread will be visible under the lock in other threads. In other words, it's necessary (although not sufficient) that synchronization operations include memory barriers of one kind or another, and necessary behaviour of a memory barrier is that it must assume globals can change.

Threads Cannot be Implemented as a Library covers some specific issues that are required for a pthreads to actually be usable, but are not explicitly stated in the Posix standard at the time of writing (2004). It becomes quite important whether your compiler-writer, or whoever defined the memory model for your implementation, agrees with Boehm what "usable" means, in terms of allowing the programmer to "reason convincingly about program correctness".

Note that Posix doesn't guarantee a coherent memory cache, so if your implementation perversely wants to cache do_something in a register in your code, then even if you marked it volatile, it might perversely choose not to dirty your CPU's local cache between the synchronizing operation and reading do_something. So if the writer thread is running on a different CPU with its own cache, you might not see the change even then.

That's (one reason) why threads cannot be implemented merely as a library. This optimization of fetching a volatile global only from local CPU cache would be valid in a single-threaded C implementation[*], but breaks multi-threaded code. Hence, the compiler needs to "know about" threads, and how they affect other language features (for an example outside pthreads: on Windows, where cache is always coherent, Microsoft spells out the additional semantics that it grants volatile in multi-threaded code). Basically, you have to assume that if your implementation has gone to the trouble of providing the pthreads functions, then it will go to the trouble of defining a workable memory model in which locks actually synchronize memory access.

If the compiler can inline the
function and prove it does not access
do_shutdown, then can it cache
do_shutdown even in a multithreaded
setting? What about a non-inlined
function in the same compilation unit?

Yes to all of this - if the object is non-volatile, and the compiler can prove that this thread doesn't modify it (either through its name or through an aliased pointer), and if no memory barriers occur, then it can reuse previous values. There can and will be other implementation-specific conditions that sometimes stop it, of course.

[*] provided that the implementation knows the global is not located at some "special" hardware address which requires that reads always go through cache to main memory in order to see the results of whatever hardware op affects that address. But to put a global at any such location, or to make its location special with DMA or whatever, requires implementation-specific magic. Absent any such magic the implementation in principle can sometimes know this.

回复收藏 0 原文

颜漓半夏 2024-10-14 21:27:32

由于 do_shutdown 具有外部链接，因此编译器无法知道调用过程中发生了什么（除非它对被调用的函数具有完全可见性）。因此，在调用后必须重新加载该值（是否易失 - 线程对此没有影响）。

据我所知，标准中没有直接说明这一点，除了标准用于定义表达式行为的（单线程）抽象机表明在表达式中访问变量时需要读取该变量。该标准允许仅在可以证明行为“好像”已重新加载的情况下优化变量的读取。只有当编译器知道该值没有被函数调用修改时，这种情况才会发生。

另外，pthread 库确实对各种函数的内存屏障做出了某些保证，包括 pthread_cond_wait() ：用 pthread 互斥锁保护变量是否保证它也不会被缓存？

现在，如果 do_shutdown 是静态的（没有外部链接），并且您有多个线程使用同一模块中定义的静态变量（即，静态变量的地址从未被传递到另一个模块），可能是一个不同的故事。例如，假设您有一个使用此类变量的函数，并启动了为该函数运行的多个线程实例。在这种情况下，符合标准的编译器实现可能会跨函数调用缓存该值，因为它可以假设没有其他东西可以修改该值（标准的抽象机器模型不包括线程）。

因此，在这种情况下，您必须使用机制来确保在调用过程中重新加载该值。请注意，由于硬件的复杂性，volatile 关键字可能不足以确保正确的内存访问顺序 - 您应该依靠 pthread 或操作系统提供的 API 来确保这一点。（顺便说一句，微软编译器的最新版本确实记录了易失性强制执行完整的内存屏障，但我读到的意见表明标准不要求这样做）。

Since do_shutdown has external linkage there's no way the compiler could know what happens to it across the call (unless it had full visibility to the functions being called). So it would have to reload the value (volatile or not - threading has no bearing on this) after the call.

As far as I know there's nothing directly said about this in the standard, except that the (single-threaded) abstract machine the standard uses to define the behavior of expressions indicates that the variable needs to be read when it's accessed in an expression. The standard permits that reading of the variable to be optimized away only if the behavior can be proven to be "as if" it were reloaded. And that can happen only if the compiler can know that the value was not modified by the function call.

Also not that the pthread library does make certain guarantees about memory barriers for various functions, including pthread_cond_wait(): Does guarding a variable with a pthread mutex guarantee it's also not cached?

Now, if do_shutdown were static (no external linkage) and you have several threads that used that static variable defined in the same module (ie., the address of the static variable was never taken to be passed to another module), That might be a different story. for example, say that you have a single function that used such a variable, and started several thread instances running for that function. In that case, a standards conforming compiler implementation might cache the value across function calls since it could assume that nothing else could modify the value (the standard's abstract machine model doesn't include threading).

So in that case, you would have to use mechanisms to ensure that the value was reloaded across the call. Note that because of hardware intricacies, the volatile keyword might not be adequate to ensure correct memory access ordering - you should rely on APIs provided by pthreads or the OS to ensure that. (as a side-note, recent versions of Microsoft's compilers do document that volatile enforce full memory barriers, but I've read opinions that indicate this isn't required by the standard).

回复收藏 0 原文

情话难免假 2024-10-14 21:27:32

那些挥手的答案都是错误的。抱歉说得太严厉了。

没有办法

假设编译器知道 pthread_cond_wait() 不会修改 do_shutdown。

如果您有不同的看法，请出示证据：一个完整的 C++ 程序，使得不是为 MT 设计的编译器可以推断出 pthread_cond_wait 不会修改 do_shutdown。

这是荒谬的，编译器不可能理解 pthread_ 函数的作用，除非它具有 POSIX 线程的内置知识。

回复收藏 0 原文

毅然前行 2024-10-14 21:27:32

从我自己的工作来看，我可以说是的，编译器可以跨 pthread_mutex_lock/pthread_mutex_unlock 缓存值。我花了一个周末的大部分时间来追踪一段代码中的一个错误，该错误是由于一组指针分配被缓存并且对于需要它们的线程不可用而引起的。作为快速测试，我将分配包装在互斥锁/解锁中，并且线程仍然无法访问正确的指针值。移动指针分配&关联的互斥锁锁定到一个单独的函数确实解决了这个问题。

回复收藏 0 原文

~没有更多了~