多线程环境中的跟随指针

发布于 2024-10-11 15:14:30 字数 1011 浏览 7 评论 0原文

如果我有一些代码看起来像：

typedef struct {
    bool some_flag;

    pthread_cond_t  c;
    pthread_mutex_t m;
} foo_t;

// I assume the mutex has already been locked, and will be unlocked
// some time after this function returns. For clarity. Definitely not
// out of laziness ;)
void check_flag(foo_t* f) {
    while(f->flag)
        pthread_cond_wait(&f->c, &f->m);
}

C 标准中是否有任何内容阻止优化器将 check_flag 重写为：

void check_flag(foo_t* f) {
    bool cache = f->flag;
    while(cache)
        pthread_cond_wait(&f->c, &f->m);
}

换句话说，生成的代码是否遵循f 指针，或者编译器是否可以自由地取消引用？

如果可以随意将其拔出，有什么办法可以防止这种情况发生吗？我需要在某处撒上 volatile 关键字吗？它不可能是 check_flag 的参数，因为我计划在这个结构中包含其他变量，我不介意编译器这样优化。

我可能不得不求助于：

void check_flag(foo_t* f) {
    volatile bool* cache = &f->some_flag;
    while(*cache)
        pthread_cond_wait(&f->c, &f->m);
}

原文

If I have some code that looks something like:

typedef struct {
    bool some_flag;

    pthread_cond_t  c;
    pthread_mutex_t m;
} foo_t;

// I assume the mutex has already been locked, and will be unlocked
// some time after this function returns. For clarity. Definitely not
// out of laziness ;)
void check_flag(foo_t* f) {
    while(f->flag)
        pthread_cond_wait(&f->c, &f->m);
}

Is there anything in the C standard preventing an optimizer from rewriting check_flag as:

void check_flag(foo_t* f) {
    bool cache = f->flag;
    while(cache)
        pthread_cond_wait(&f->c, &f->m);
}

In other words, does the generated code have to follow the f pointer every time through the loop, or is the compiler free to pull the dereference out?

If it is free to pull it out, is there any way to prevent this? Do I need to sprinkle a volatile keyword somewhere? It can't be check_flag's parameter because I plan on having other variables in this struct that I don't mind the compiler optimizing like this.

Might I have to resort to:

void check_flag(foo_t* f) {
    volatile bool* cache = &f->some_flag;
    while(*cache)
        pthread_cond_wait(&f->c, &f->m);
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

四叶草在未来唯美盛开 2024-10-18 15:14:31

挥发性就是为了这个目的。不过，依靠编译器来了解 pthread 编码实践对我来说似乎有点疯狂；如今编译器非常聪明。事实上，编译器可能会看到您正在循环测试变量，并且不会因此而将其缓存在寄存器中，而不是因为它看到您使用 pthreads。如果你真的关心的话，就使用 volatile 吧。

有点有趣的小纸条。我们有一个 VOLATILE #define，它要么是“易失性”（当我们认为错误不可能是我们的代码时......）要么是空白。当我们认为由于优化器杀死我们而导致崩溃时，我们#define它“易失性”，这将易失性放在几乎所有内容的前面。然后我们进行测试，看看问题是否消失。到目前为止...错误一直是开发人员而不是编译器！谁能想到！我们开发了高性能的“非锁定”和“非阻塞”线程库。我们有一个测试平台，可以将其达到每秒数千场比赛的程度。到目前为止，我们还没有发现需要 volatile 的问题！到目前为止，gcc 从未在寄存器中缓存过共享变量。是的...我们也很惊讶。我们仍在等待使用 volatile 的机会！

回复收藏 0 原文

风吹雨成花 2024-10-18 15:14:30

在一般情况下，即使不涉及多线程并且您的循环如下所示：

void check_flag(foo_t* f) {
    while(f->flag)
        foo(&f->c, &f->m);
}

编译器将无法缓存 f->flag 测试。这是因为编译器无法知道函数（如上面的 foo()）是否可能更改 f 指向的任何对象。

在特殊情况下（foo() 对编译器可见，并且已知传递给 check_flag() 的所有指针都不会被 foo 别名或以其他方式修改()) 编译器也许能够优化检查。

但是，pthread_cond_wait() 的实现方式必须阻止该优化。

请参阅是否使用以下方式保护变量pthread 互斥保证它也不会被缓存？：

您可能还对 Steve Jessop 的回答感兴趣：C/C++ 编译器可以通过 pthread 库调用合法地将变量缓存在寄存器中吗？

但是您想要多远将伯姆论文提出的问题应用到你自己的工作中取决于你。据我所知，如果您想采取 pthreads 没有/不能提供保证的立场，那么您本质上就是采取 pthreads 无用的立场（或者至少不提供安全保证，这我认为减少有相同的结果）。虽然从最严格的意义上来说这可能是正确的（如本文所述），但这也可能不是一个有用的答案。我不确定除了基于 Unix 的平台上的 pthreads 之外还有什么选择。

In the general case, even if multi-threading wasn't involved and your loop looked like:

void check_flag(foo_t* f) {
    while(f->flag)
        foo(&f->c, &f->m);
}

the compiler would be unable to to cache the f->flag test. That's because the compiler can't know whether or not a function (like foo() above) might change whatever object f is pointing to.

Under special circumstances (foo() is visible to the compiler, and all pointers passed to the check_flag() are known not to be aliased or otherwise modifiable by foo()) the compiler might be able to optimize the check.

However, pthread_cond_wait() must be implemented in a way that would prevent that optimization.

See Does guarding a variable with a pthread mutex guarantee it's also not cached?:

You might also be interested in Steve Jessop's answer to: Can a C/C++ compiler legally cache a variable in a register across a pthread library call?

But how far you want to take the issues raised by Boehm's paper in your own work is up to you. As far as I can tell, if you want to take the stand that pthreads doesn't/can't make the guarantee, then you're in essence taking the stand that pthreads is useless (or at least provides no safety guarantees, which I think by reduction has the same outcome). While this might be true in the strictest sense (as addressed in the paper), it's also probably not a useful answer. I'm not sure what option you'd have other than pthreads on Unix-based platforms.

回复收藏 0 原文

简美 2024-10-18 15:14:30

通常，您应该在等待条件对象之前尝试锁定 pthread 互斥体，因为 pthread_cond_wait 调用会释放互斥体（并在返回之前重新获取它）。因此，您的 check_flag 函数应该像这样重写，以符合 pthread 条件的语义。

void check_flag(foo_t* f) {
    pthread_mutex_lock(&f->m);
    while(f->flag)
        pthread_cond_wait(&f->c, &f->m);
    pthread_mutex_unlock(&f->m);
}

关于是否允许编译器优化flag字段的读取的问题，这个答案比我更详细地解释了它。

基本上，编译器知道pthread_cond_wait、pthread_mutex_lock和pthread_mutex_unlock的语义。他知道在这种情况下他无法优化内存读取（本例中调用pthread_cond_wait）。这里没有记忆障碍的概念，只是对某些功能的特殊知识，以及它们存在时要遵循的一些规则。

还有一件事可以保护您免受处理器执行优化的影响。如果语义保持不变，您的普通处理器能够重新排序内存访问（读/写），并且它总是这样做（因为它可以提高性能）。但是，当多个处理器可以访问同一内存地址时，就会出现这种情况。内存屏障只是向处理器发出的一条指令，告诉处理器可以移动屏障之前发出的读/写操作并在屏障之后执行它们。现在他们已经完蛋了。

Normally, you should try to lock the pthread mutex before waiting on the condition object as the pthread_cond_wait call release the mutex (and reacquire it before returning). So, your check_flag function should be rewritten like that to conform to the semantic on the pthread condition.

void check_flag(foo_t* f) {
    pthread_mutex_lock(&f->m);
    while(f->flag)
        pthread_cond_wait(&f->c, &f->m);
    pthread_mutex_unlock(&f->m);
}

Concerning the question of whether or not the compiler is allowed to optimize the reading of the flagfield, this answer explains it in more detail than I can.

Basically, the compiler know about the semantic of pthread_cond_wait, pthread_mutex_lock and pthread_mutex_unlock. He know that he can't optimize memory reading in those situation (the call to pthread_cond_wait in this exemple). There is no notion of memory barrier here, just a special knowledge of certain function, and some rule to follow in their presence.

There is another thing protecting you from optimization performed by the processor. Your average processor is capable of reordering memory access (read / write) provided that the semantic is conserved, and it is always doing it (as it allow to increase performance). However, this break when more than one processor can access the same memory address. A memory barrier is just an instruction to the processor telling it that it can move the read / write that were issued before the barrier and execute them after the barrier. It has finish them now.

回复收藏 0 原文