在 C# 中模拟撕裂双精度数

发布于 2024-12-28 17:45:48 字数 1635 浏览 4 评论 0原文

我在 32 位机器上运行,并且我能够使用以下快速命中的代码片段来确认长值可能会被破坏。

        static void TestTearingLong()
        {
            System.Threading.Thread A = new System.Threading.Thread(ThreadA);
            A.Start();

            System.Threading.Thread B = new System.Threading.Thread(ThreadB);
            B.Start();
        }

        static ulong s_x;

        static void ThreadA()
        {
            int i = 0;
            while (true)
            {
                s_x = (i & 1) == 0 ? 0x0L : 0xaaaabbbbccccddddL;
                i++;
            }
        }

        static void ThreadB()
        {
            while (true)
            {
                ulong x = s_x;
                Debug.Assert(x == 0x0L || x == 0xaaaabbbbccccddddL);
            }
        }

但当我尝试用双打做类似的事情时,我无法得到任何撕裂。有谁知道为什么?据我从规范中可以看出,只有对浮点数的赋值是原子的。分配给双打应该有撕裂的风险。

    static double s_x;

    static void TestTearingDouble()
    {
        System.Threading.Thread A = new System.Threading.Thread(ThreadA);
        A.Start();

        System.Threading.Thread B = new System.Threading.Thread(ThreadB);
        B.Start();
    }

    static void ThreadA()
    {
        long i = 0;

        while (true)
        {
            s_x = ((i & 1) == 0) ? 0.0 : double.MaxValue;
            i++;

            if (i % 10000000 == 0)
            {
                Console.Out.WriteLine("i = " + i);
            }
        }
    }

    static void ThreadB()
    {
        while (true)
        {
            double x = s_x;

            System.Diagnostics.Debug.Assert(x == 0.0 || x == double.MaxValue);
        }
    }

I'm running on a 32-bit machine and I'm able to confirm that long values can tear using the following code snippet which hits very quickly.

        static void TestTearingLong()
        {
            System.Threading.Thread A = new System.Threading.Thread(ThreadA);
            A.Start();

            System.Threading.Thread B = new System.Threading.Thread(ThreadB);
            B.Start();
        }

        static ulong s_x;

        static void ThreadA()
        {
            int i = 0;
            while (true)
            {
                s_x = (i & 1) == 0 ? 0x0L : 0xaaaabbbbccccddddL;
                i++;
            }
        }

        static void ThreadB()
        {
            while (true)
            {
                ulong x = s_x;
                Debug.Assert(x == 0x0L || x == 0xaaaabbbbccccddddL);
            }
        }

But when I try something similar with doubles, I'm not able to get any tearing. Does anyone know why? As far as I can tell from the spec, only assignment to a float is atomic. The assignment to a double should have a risk of tearing.

    static double s_x;

    static void TestTearingDouble()
    {
        System.Threading.Thread A = new System.Threading.Thread(ThreadA);
        A.Start();

        System.Threading.Thread B = new System.Threading.Thread(ThreadB);
        B.Start();
    }

    static void ThreadA()
    {
        long i = 0;

        while (true)
        {
            s_x = ((i & 1) == 0) ? 0.0 : double.MaxValue;
            i++;

            if (i % 10000000 == 0)
            {
                Console.Out.WriteLine("i = " + i);
            }
        }
    }

    static void ThreadB()
    {
        while (true)
        {
            double x = s_x;

            System.Diagnostics.Debug.Assert(x == 0.0 || x == double.MaxValue);
        }
    }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

陌伤ぢ 2025-01-04 17:45:48
static double s_x;

当您使用双精度时,要展示效果要困难得多。 CPU 使用专用指令来加载和存储双精度型,分别为 FLD 和 FSTP。使用long要容易得多,因为没有单个指令可以在 32 位模式下加载/存储 64 位整数。要观察它,您需要使变量的地址未对齐,以便它跨越 cpu 缓存行边界。

您使用的声明永远不会发生这种情况,JIT 编译器确保双精度正确对齐,存储在 8 的倍数的地址中。您可以将它存储在类的字段中,GC 分配器仅与 4 对齐32 位模式。但这是一个废话。

最好的方法是使用指针有意地错位双精度数。将 unsafe 放在 Program 类前面,并使其看起来与此类似:

    static double* s_x;

    static void Main(string[] args) {
        var mem = Marshal.AllocCoTaskMem(100);
        s_x = (double*)((long)(mem) + 28);
        TestTearingDouble();
    }
ThreadA:
            *s_x = ((i & 1) == 0) ? 0.0 : double.MaxValue;
ThreadB:
            double x = *s_x;

这仍然不能保证良好的未对齐(呵呵),因为无法准确控制 AllocCoTaskMem() 将对齐分配的位置相对于 cpu 高速缓存行的开始。这取决于您的 cpu 核心(我的是 Core i5)中的缓存关联性。你必须修改偏移量,我通过实验得到了值 28。该值应能被 4 整除,但不能被 8 整除,才能真正模拟 GC 堆行为。继续向该值添加 8,直到获得跨越缓存行并触发断言的双精度值。

为了减少人工,您必须编写一个程序来存储类字段中的双精度值,并让垃圾收集器在内存中移动它,以便它不对齐。很难想出一个示例程序来确保发生这种情况。

另请注意您的程序如何演示称为“错误共享”的问题。注释掉线程 B 的 Start() 方法调用,并注意线程 A 的运行速度快了多少。您会看到 cpu 保持 cpu 核心之间的缓存线一致的成本。这里意在共享,因为线程访问相同的变量。当线程访问存储在同一缓存行中的不同变量时,就会发生真正的错误共享。否则,这就是对齐很重要的原因,只有当双精度值的一部分位于一个缓存行而部分位于另一缓存行时,您才能观察到双精度的撕裂。

static double s_x;

It is much harder to demonstrate the effect when you use a double. The CPU uses dedicated instructions to load and store a double, respectively FLD and FSTP. It is much easier with long since there is no single instruction that load/stores a 64-bit integer in 32-bit mode. To observe it you need to have the variable's address misaligned so it straddles the cpu cache line boundary.

That will never happen with the declaration you used, the JIT compiler ensures that the double is aligned properly, stored at an address that's a multiple of 8. You could store it in a field of a class, the GC allocator only aligns to 4 in 32-bit mode. But that's a crap shoot.

Best way to do it is by intentionally mis-aligning the double by using a pointer. Put unsafe in front of the Program class and make it look similar to this:

    static double* s_x;

    static void Main(string[] args) {
        var mem = Marshal.AllocCoTaskMem(100);
        s_x = (double*)((long)(mem) + 28);
        TestTearingDouble();
    }
ThreadA:
            *s_x = ((i & 1) == 0) ? 0.0 : double.MaxValue;
ThreadB:
            double x = *s_x;

This still won't guarantee a good misalignment (hehe) since there's no way to control exactly where AllocCoTaskMem() will align the allocation relative to the start of the cpu cache line. And it depends on the cache associativity in your cpu core (mine is a Core i5). You'll have to tinker with the offset, I got the value 28 by experimentation. The value should be divisible by 4 but not by 8 to truly simulate the GC heap behavior. Keep adding 8 to the value until you get the double to straddle the cache line and trigger the assert.

To make it less artificial you'll have to write a program that stores the double in field of a class and get the garbage collector to move it around in memory so it gets misaligned. Kinda hard to come up with a sample program that ensures this happens.

Also note how your program can demonstrate a problem called false sharing. Comment out the Start() method call for thread B and note how much faster thread A runs. You are seeing the cost of the cpu keeping the cache line consistent between the cpu cores. Sharing is intended here since the threads access the same variable. Real false sharing happens when threads access different variables that are stored in the same cache line. This is otherwise why alignment matters, you can only observe the tearing for a double when part of it is in one cache line and part of it is in another.

丶情人眼里出诗心の 2025-01-04 17:45:48

听起来很奇怪,但这取决于您的 CPU。虽然双打不能保证不会撕裂,但在许多当前的处理器上却不会。如果您想在这种情况下出现撕裂,请尝试 AMD Sempron。

编辑:几年前艰难地了解到这一点。

As strange as it sounds, that depends on your CPU. While doubles are not guaranteed not to tear, they won't on many current processors. Try an AMD Sempron if you want tearing in this situation.

EDIT: Learned that the hard way a few years ago.

何时共饮酒 2025-01-04 17:45:48

经过一番挖掘,我发现了一些有关 x86 架构上浮点运算的有趣读物:

根据 维基百科,x86浮点单元在80位寄存器中存储浮点值:

[...] 随后的 x86 处理器随后集成了此 x87 功能
在芯片上,这使得 x87 指令成为事实上的组成部分
x86 指令集。每个 x87 寄存器,称为 ST(0) 到
ST(7),80 位宽,以 IEEE 浮点形式存储数字
标准双扩展精度格式。

另外这个问题也是相关的: 一些浮点精度和数字限制问题

这可以解释为什么虽然双精度数是 64 位,但它们是原子操作的。

Doing some digging, I've found some interesting reads concerning floating-point operations on x86 architectures:

According to Wikipedia, the x86 floating-point unit stored floating-point values in 80-bit registers:

[...] subsequent x86 processors then integrated this x87 functionality
on chip which made the x87 instructions a de facto integral part of
the x86 instruction set. Each x87 register, known as ST(0) through
ST(7), is 80 bits wide and stores numbers in the IEEE floating-point
standard double extended precision format.

Also this other SO question is related: Some floating point precision and numeric limits question

This could explain why, although doubles are 64-bits, they are operated on atomically.

掌心的温暖 2025-01-04 17:45:48

对于本主题和代码示例的价值,可以在此处找到。

http://msdn.microsoft.com/en-us/magazine/cc817398.aspx

For what its worth this topic and code sample can be found here.

http://msdn.microsoft.com/en-us/magazine/cc817398.aspx

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文