在 C# 中模拟撕裂双精度数

发布于 2024-12-28 17:45:48 字数 1635 浏览 4 评论 0原文

我在 32 位机器上运行，并且我能够使用以下快速命中的代码片段来确认长值可能会被破坏。

        static void TestTearingLong()
        {
            System.Threading.Thread A = new System.Threading.Thread(ThreadA);
            A.Start();

            System.Threading.Thread B = new System.Threading.Thread(ThreadB);
            B.Start();
        }

        static ulong s_x;

        static void ThreadA()
        {
            int i = 0;
            while (true)
            {
                s_x = (i & 1) == 0 ? 0x0L : 0xaaaabbbbccccddddL;
                i++;
            }
        }

        static void ThreadB()
        {
            while (true)
            {
                ulong x = s_x;
                Debug.Assert(x == 0x0L || x == 0xaaaabbbbccccddddL);
            }
        }

但当我尝试用双打做类似的事情时，我无法得到任何撕裂。有谁知道为什么？据我从规范中可以看出，只有对浮点数的赋值是原子的。分配给双打应该有撕裂的风险。

    static double s_x;

    static void TestTearingDouble()
    {
        System.Threading.Thread A = new System.Threading.Thread(ThreadA);
        A.Start();

        System.Threading.Thread B = new System.Threading.Thread(ThreadB);
        B.Start();
    }

    static void ThreadA()
    {
        long i = 0;

        while (true)
        {
            s_x = ((i & 1) == 0) ? 0.0 : double.MaxValue;
            i++;

            if (i % 10000000 == 0)
            {
                Console.Out.WriteLine("i = " + i);
            }
        }
    }

    static void ThreadB()
    {
        while (true)
        {
            double x = s_x;

            System.Diagnostics.Debug.Assert(x == 0.0 || x == double.MaxValue);
        }
    }

原文

I'm running on a 32-bit machine and I'm able to confirm that long values can tear using the following code snippet which hits very quickly.

        static void TestTearingLong()
        {
            System.Threading.Thread A = new System.Threading.Thread(ThreadA);
            A.Start();

            System.Threading.Thread B = new System.Threading.Thread(ThreadB);
            B.Start();
        }

        static ulong s_x;

        static void ThreadA()
        {
            int i = 0;
            while (true)
            {
                s_x = (i & 1) == 0 ? 0x0L : 0xaaaabbbbccccddddL;
                i++;
            }
        }

        static void ThreadB()
        {
            while (true)
            {
                ulong x = s_x;
                Debug.Assert(x == 0x0L || x == 0xaaaabbbbccccddddL);
            }
        }

But when I try something similar with doubles, I'm not able to get any tearing. Does anyone know why? As far as I can tell from the spec, only assignment to a float is atomic. The assignment to a double should have a risk of tearing.

    static double s_x;

    static void TestTearingDouble()
    {
        System.Threading.Thread A = new System.Threading.Thread(ThreadA);
        A.Start();

        System.Threading.Thread B = new System.Threading.Thread(ThreadB);
        B.Start();
    }

    static void ThreadA()
    {
        long i = 0;

        while (true)
        {
            s_x = ((i & 1) == 0) ? 0.0 : double.MaxValue;
            i++;

            if (i % 10000000 == 0)
            {
                Console.Out.WriteLine("i = " + i);
            }
        }
    }

    static void ThreadB()
    {
        while (true)
        {
            double x = s_x;

            System.Diagnostics.Debug.Assert(x == 0.0 || x == double.MaxValue);
        }
    }

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

陌伤ぢ 2025-01-04 17:45:48

static double s_x;

当您使用双精度时，要展示效果要困难得多。 CPU 使用专用指令来加载和存储双精度型，分别为 FLD 和 FSTP。使用long要容易得多，因为没有单个指令可以在 32 位模式下加载/存储 64 位整数。要观察它，您需要使变量的地址未对齐，以便它跨越 cpu 缓存行边界。

您使用的声明永远不会发生这种情况，JIT 编译器确保双精度正确对齐，存储在 8 的倍数的地址中。您可以将它存储在类的字段中，GC 分配器仅与 4 对齐32 位模式。但这是一个废话。

最好的方法是使用指针有意地错位双精度数。将 unsafe 放在 Program 类前面，并使其看起来与此类似：

    static double* s_x;

    static void Main(string[] args) {
        var mem = Marshal.AllocCoTaskMem(100);
        s_x = (double*)((long)(mem) + 28);
        TestTearingDouble();
    }
ThreadA:
            *s_x = ((i & 1) == 0) ? 0.0 : double.MaxValue;
ThreadB:
            double x = *s_x;

这仍然不能保证良好的未对齐（呵呵），因为无法准确控制 AllocCoTaskMem() 将对齐分配的位置相对于 cpu 高速缓存行的开始。这取决于您的 cpu 核心（我的是 Core i5）中的缓存关联性。你必须修改偏移量，我通过实验得到了值 28。该值应能被 4 整除，但不能被 8 整除，才能真正模拟 GC 堆行为。继续向该值添加 8，直到获得跨越缓存行并触发断言的双精度值。

为了减少人工，您必须编写一个程序来存储类字段中的双精度值，并让垃圾收集器在内存中移动它，以便它不对齐。很难想出一个示例程序来确保发生这种情况。

另请注意您的程序如何演示称为“错误共享”的问题。注释掉线程 B 的 Start() 方法调用，并注意线程 A 的运行速度快了多少。您会看到 cpu 保持 cpu 核心之间的缓存线一致的成本。这里意在共享，因为线程访问相同的变量。当线程访问存储在同一缓存行中的不同变量时，就会发生真正的错误共享。否则，这就是对齐很重要的原因，只有当双精度值的一部分位于一个缓存行而部分位于另一缓存行时，您才能观察到双精度的撕裂。

static double s_x;

It is much harder to demonstrate the effect when you use a double. The CPU uses dedicated instructions to load and store a double, respectively FLD and FSTP. It is much easier with long since there is no single instruction that load/stores a 64-bit integer in 32-bit mode. To observe it you need to have the variable's address misaligned so it straddles the cpu cache line boundary.

That will never happen with the declaration you used, the JIT compiler ensures that the double is aligned properly, stored at an address that's a multiple of 8. You could store it in a field of a class, the GC allocator only aligns to 4 in 32-bit mode. But that's a crap shoot.

Best way to do it is by intentionally mis-aligning the double by using a pointer. Put unsafe in front of the Program class and make it look similar to this:

    static double* s_x;

    static void Main(string[] args) {
        var mem = Marshal.AllocCoTaskMem(100);
        s_x = (double*)((long)(mem) + 28);
        TestTearingDouble();
    }
ThreadA:
            *s_x = ((i & 1) == 0) ? 0.0 : double.MaxValue;
ThreadB:
            double x = *s_x;

This still won't guarantee a good misalignment (hehe) since there's no way to control exactly where AllocCoTaskMem() will align the allocation relative to the start of the cpu cache line. And it depends on the cache associativity in your cpu core (mine is a Core i5). You'll have to tinker with the offset, I got the value 28 by experimentation. The value should be divisible by 4 but not by 8 to truly simulate the GC heap behavior. Keep adding 8 to the value until you get the double to straddle the cache line and trigger the assert.

To make it less artificial you'll have to write a program that stores the double in field of a class and get the garbage collector to move it around in memory so it gets misaligned. Kinda hard to come up with a sample program that ensures this happens.

Also note how your program can demonstrate a problem called false sharing. Comment out the Start() method call for thread B and note how much faster thread A runs. You are seeing the cost of the cpu keeping the cache line consistent between the cpu cores. Sharing is intended here since the threads access the same variable. Real false sharing happens when threads access different variables that are stored in the same cache line. This is otherwise why alignment matters, you can only observe the tearing for a double when part of it is in one cache line and part of it is in another.

回复收藏 0 原文