如果两个线程读取 & 会发生什么?写入同一块内存

发布于 2024-09-15 21:02:10 字数 546 浏览 3 评论 0原文

我的理解是,如果两个线程正在从同一块内存中读取数据,并且没有线程正在写入该内存,则该操作是安全的。但是,我不确定如果一个线程正在读取而另一个线程正在写入会发生什么。会发生什么?结果未定义吗?或者读的内容会变得陈旧吗?如果过时的读取不是问题,是否可以对变量进行不同步的读写?或者是否有可能数据会被损坏,并且读取和写入都不正确,并且在这种情况下应该始终同步?

我想说的是,我已经了解到这是后一种情况,内存访问的竞争使状态未定义......但我不记得我可能在哪里了解到这一点,并且我很难找到在谷歌上回答。我的直觉是,变量在寄存器中进行操作,并且真正的(如在硬件中)并发是不可能的(或者是),因此可能发生的最糟糕的情况是陈旧的数据,即以下情况

WriteThread: copy value from memory to register
WriteThread: update value in register
ReadThread:  copy value of memory to register
WriteThread: write new value to memory

:有陈旧的数据。

It's my understanding that if two threads are reading from the same piece of memory, and no thread is writing to that memory, then the operation is safe. However, I'm not sure what happens if one thread is reading and the other is writing. What would happen? Is the result undefined? Or would the read just be stale? If a stale read is not a concern is it ok to have unsynchronized read-write to a variable? Or is it possible the data would be corrupted, and neither the read nor the write would be correct and one should always synchronize in this case?

I want to say that I've learned it is the later case, that a race on memory access leaves the state undefined... but I don't remember where I may have learned that and I'm having a hard time finding the answer on google. My intuition is that a variable is operated on in registers, and that true (as in hardware) concurrency is impossible (or is it), so that the worst that could happen is stale data, i.e. the following:

WriteThread: copy value from memory to register
WriteThread: update value in register
ReadThread:  copy value of memory to register
WriteThread: write new value to memory

At which point the read thread has stale data.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

墨落成白 2024-09-22 21:02:10

通常,内存以由 CPU 架构确定的原子单元进行读取或写入(32 位和 64 位项目在 32 位和 64 位边界上对齐现在很常见)。

在这种情况下,会发生什么取决于写入的数据量。

让我们考虑 32 位原子读/写单元的情况。

如果两个线程将 32 位写入这样一个对齐的单元中,那么发生的情况绝对是明确的:保留两个写入值之一。不幸的是,对于您(好吧,程序)来说,您不知道哪个值。通过极其聪明的编程,您实际上可以使用这种读取和写入的原子性来构建同步算法(例如,Dekker's算法),但通常使用架构定义的锁会更快。

如果两个线程写入的内容多于一个原子单元(例如,它们都写入一个 128 位值),那么实际上写入的值的原子单元大小的片段将以绝对明确定义的方式存储,但是不会知道哪个值的哪些部分是以什么顺序写入的。因此,最终可能存储在存储中的是来自第一个线程、第二个线程的值,或者来自两个线程的原子单元大小的位的混合。

类似的想法适用于以原子单位或更大单位进行的一个线程读取和一个线程写入。

基本上,您不想对内存位置进行不同步的读取和写入,因为您不会知道结果,即使架构可能很好地定义了结果。

Usually memory is read or written in atomic units determined by the CPU architecture (32 bit and 64 bits item aligned on 32 bit and 64 bit boundaries is common these days).

In this case, what happens depends on the amount of data being written.

Let's consider the case of 32 bit atomic read/write cells.

If two threads write 32 bits into such an aligned cell, then it is absolutely well defined what happens: one of the two written values is retained. Unfortunately for you (well, the program), you don't know which value. By extremely clever programming, you can actually use this atomicity of reads and writes to build synchronization algorithms (e.g., Dekker's algorithm), but it is faster typically to use architecturally defined locks instead.

If two threads write more than an atomic unit (e.g., they both write a 128 bit value), then in fact the atomic unit sized pieces of the values written will be stored in a absolutely well defined way, but you won't know which pieces of which value get written in what order. So what may end up in storage is the value from the first thread, the second thread, or mixes of the bits in atomic unit sizes from both threads.

Similar ideas hold for one thread reading, and one thread writing in atomic units, and larger.

Basically, you don't want to do unsynchronized reads and writes to memory locations, because you won't know the outcome, even though it may be very well defined by the architecture.

朦胧时间 2024-09-22 21:02:10

结果是未定义的。数据损坏是完全有可能的。举一个明显的例子,考虑由 32 位处理器操作的 64 位值。假设该值是一个简单的计数器,当低 32 位包含 0xffffffff 时,我们将其递增。增量产生 0x00000000。当我们检测到这一点时,我们会增加上面的单词。然而,如果其他线程在低位字递增和高位字递增之间读取该值,它们会得到一个高位字未递增的值,但低位字设置为 0——这是一个完全不同的值与增量完成之前的情况不同。

The result is undefined. Corrupted data is entirely possible. For an obvious example, consider a 64-bit value being manipulated by a 32-bit processor. Let's assume the value is a simple counter, and we increment it when the lower 32-bits contain 0xffffffff. The increment produces 0x00000000. When we detect that, we increment the upper word. If, however, some other thread read the value between the time the lower word was incremented and the upper word was incremented, they get a value with an un-incremented upper word, but the lower word set to 0 -- a value completely different from what it would have been either before or after the increment is complete.

倒数 2024-09-22 21:02:10

正如我在中暗示的那样Ira Baxter 的回答是,CPU 缓存在多核系统中也发挥着作用。考虑以下测试代码:

危险将罗宾逊!

以下代码将优先级提高到实时,以实现更一致的结果 - 虽然这样做需要管理员权限,但如果在双核或单核系统上运行代码要小心,因为您的计算机将在测试运行期间锁定。

#include <windows.h>
#include <stdio.h>

const int RUNFOR = 5000;
volatile bool terminating = false;
volatile int value;

static DWORD WINAPI CountErrors(LPVOID parm)
{
    int errors = 0;
    while(!terminating)
    {
        value = (int) parm;
        if(value != (int) parm)
            errors++;
    }
    printf("\tThread %08X: %d errors\n", parm, errors);
    return 0;
}

static void RunTest(int affinity1, int affinity2)
{
    terminating = false;
    DWORD dummy;
    HANDLE t1 = CreateThread(0, 0, CountErrors, (void*)0x1000, CREATE_SUSPENDED, &dummy);
    HANDLE t2 = CreateThread(0, 0, CountErrors, (void*)0x2000, CREATE_SUSPENDED, &dummy);

    SetThreadAffinityMask(t1, affinity1);
    SetThreadAffinityMask(t2, affinity2);
    ResumeThread(t1);
    ResumeThread(t2);

    printf("Running test for %d milliseconds with affinity %d and %d\n", RUNFOR, affinity1, affinity2);
    Sleep(RUNFOR);
    terminating = true;
    Sleep(100); // let threads have a chance of picking up the "terminating" flag.
}

int main()
{
    SetPriorityClass(GetCurrentProcess(), REALTIME_PRIORITY_CLASS);
    RunTest(1, 2);      // core 1 & 2
    RunTest(1, 4);      // core 1 & 3
    RunTest(4, 8);      // core 3 & 4
    RunTest(1, 8);      // core 1 & 4
}

在我的四核 intel Q6600 系统(iirc 有两组核心,其中每组共享 L2 缓存 - 无论如何都会解释结果;))上,我得到以下结果:

Running test for 5000 milliseconds with affinity 1 and 2
        Thread 00002000: 351883 errors
        Thread 00001000: 343523 errors
Running test for 5000 milliseconds with affinity 1 and 4
        Thread 00001000: 48073 errors
        Thread 00002000: 59813 errors
Running test for 5000 milliseconds with affinity 4 and 8
        Thread 00002000: 337199 errors
        Thread 00001000: 335467 errors
Running test for 5000 milliseconds with affinity 1 and 8
        Thread 00001000: 55736 errors
        Thread 00002000: 72441 errors

As I hinted in Ira Baxter's answer, CPU cache also plays a part on multicore systems. Consider the following test code:

DANGER WILL ROBISON!

The following code boosts priority to realtime to achieve somewhat more consistent results - while doing so requires admin privileges, be careful if running the code on dual- or single-core systems, since your machine will lock up for the duration of the test run.

#include <windows.h>
#include <stdio.h>

const int RUNFOR = 5000;
volatile bool terminating = false;
volatile int value;

static DWORD WINAPI CountErrors(LPVOID parm)
{
    int errors = 0;
    while(!terminating)
    {
        value = (int) parm;
        if(value != (int) parm)
            errors++;
    }
    printf("\tThread %08X: %d errors\n", parm, errors);
    return 0;
}

static void RunTest(int affinity1, int affinity2)
{
    terminating = false;
    DWORD dummy;
    HANDLE t1 = CreateThread(0, 0, CountErrors, (void*)0x1000, CREATE_SUSPENDED, &dummy);
    HANDLE t2 = CreateThread(0, 0, CountErrors, (void*)0x2000, CREATE_SUSPENDED, &dummy);

    SetThreadAffinityMask(t1, affinity1);
    SetThreadAffinityMask(t2, affinity2);
    ResumeThread(t1);
    ResumeThread(t2);

    printf("Running test for %d milliseconds with affinity %d and %d\n", RUNFOR, affinity1, affinity2);
    Sleep(RUNFOR);
    terminating = true;
    Sleep(100); // let threads have a chance of picking up the "terminating" flag.
}

int main()
{
    SetPriorityClass(GetCurrentProcess(), REALTIME_PRIORITY_CLASS);
    RunTest(1, 2);      // core 1 & 2
    RunTest(1, 4);      // core 1 & 3
    RunTest(4, 8);      // core 3 & 4
    RunTest(1, 8);      // core 1 & 4
}

On my Quad-core intel Q6600 system (which iirc has two sets of cores where each set share L2 cache - would explain the results anyway ;)), I get the following results:

Running test for 5000 milliseconds with affinity 1 and 2
        Thread 00002000: 351883 errors
        Thread 00001000: 343523 errors
Running test for 5000 milliseconds with affinity 1 and 4
        Thread 00001000: 48073 errors
        Thread 00002000: 59813 errors
Running test for 5000 milliseconds with affinity 4 and 8
        Thread 00002000: 337199 errors
        Thread 00001000: 335467 errors
Running test for 5000 milliseconds with affinity 1 and 8
        Thread 00001000: 55736 errors
        Thread 00002000: 72441 errors
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文