并行性：浮点结果略有不同？

发布于 2024-11-01 20:44:37 字数 1731 浏览 1 评论 0原文

我正在尝试调试 D 编程语言的并行库。最近提交的错误报告表明某些浮动的低位使用任务执行的点操作在运行中是不确定的。（如果您阅读该报告，请注意并行减少通过以确定性方式创建任务来在幕后工作。）

这似乎不是舍入模式问题，因为我尝试手动设置舍入模式。我也很确定这不是并发错误。该库经过了良好的测试（包括通过 Jinx 压力测试），问题始终局限于低端顺序位，甚至在单核机器上也会发生这种情况，在单核机器上，低级内存模型问题不是什么问题。根据调度操作的线程的不同，浮点结果可能会有所不同，还有哪些其他原因？

编辑：我在这里进行一些 printf 调试，似乎各个任务的结果有时在运行中会有所不同。

编辑#2：以下代码以更简单的方式重现此问题。它在主线程中对数组的项求和，然后启动一个新线程来执行完全相同的函数。这个问题绝对不是我的库中的错误，因为这段代码甚至没有使用我的库。

import std.algorithm, core.thread, std.stdio, core.stdc.fenv;

real sumRange(const(real)[] range) {
    writeln("Rounding mode:  ", fegetround);  // 0 from both threads.
    return reduce!"a + b"(range);
}

void main() {
    immutable n = 1_000_000;
    immutable delta = 1.0 / n;

    auto terms = new real[1_000_000];
    foreach(i, ref term; terms) {
        immutable x = ( i - 0.5 ) * delta;
        term = delta / ( 1.0 + x * x ) * 1;
    }

    immutable res1 = sumRange(terms);
    writefln("%.19f", res1);

    real res2;
    auto t = new Thread( { res2 = sumRange(terms); } );
    t.start();
    t.join();
    writefln("%.19f", res2);
}

输出：

舍入模式：0

0.7853986633972191094

舍入模式：0

0.7853986633972437348

另一个编辑

这是我以十六进制打印时的输出：

舍入模式：0

0x1.921fc60b39f1331cp-1

舍入模式：0

0x1.921fc60b39ff1p-1

另外，这似乎只发生在 Windows 上。当我在 Linux VM 上运行此代码时，两个线程都得到相同的答案。

答案：事实证明，根本原因是 D 中主线程上的浮点状态初始化方式与 Windows 上的其他线程上的初始化方式不同。请参阅我刚刚提交的错误报告。

原文

I'm trying to debug my parallelism library for the D programming language. A bug report was recently filed that indicates that the low-order bits of some floating point operations that are performed using tasks are non-deterministic across runs. (If you read the report, note that parallel reduce works under the hood by creating tasks in a deterministic way.)

This doesn't appear to be a rounding mode issue, because I tried setting the rounding mode manually. I'm also pretty sure this is not a concurrency bug. The library is well-tested (including passing a Jinx stress test), the issue is always confined to the low-order bits, and it happens even on single-core machines, where low-level memory model issues are less of a problem. What are some other reasons why floating point results might differ depending on what thread the operations are scheduled on?

Edit: I'm doing some printf debugging here and it seems like the results for the individual tasks are sometimes different across runs.

Edit # 2: The following code reproduces this issue in a much simpler way. It sums the terms of an array in the main thread, then starts a new thread to execute the exact same function. The problem is definitely not a bug in my library, because this code doesn't even use my library.

import std.algorithm, core.thread, std.stdio, core.stdc.fenv;

real sumRange(const(real)[] range) {
    writeln("Rounding mode:  ", fegetround);  // 0 from both threads.
    return reduce!"a + b"(range);
}

void main() {
    immutable n = 1_000_000;
    immutable delta = 1.0 / n;

    auto terms = new real[1_000_000];
    foreach(i, ref term; terms) {
        immutable x = ( i - 0.5 ) * delta;
        term = delta / ( 1.0 + x * x ) * 1;
    }

    immutable res1 = sumRange(terms);
    writefln("%.19f", res1);

    real res2;
    auto t = new Thread( { res2 = sumRange(terms); } );
    t.start();
    t.join();
    writefln("%.19f", res2);
}

Output:

Rounding mode: 0

0.7853986633972191094

Rounding mode: 0

0.7853986633972437348

Another Edit

Here's the output when I print in hex instead:

Rounding mode: 0

0x1.921fc60b39f1331cp-1

Rounding mode: 0

0x1.921fc60b39ff1p-1

Also, this only seems to happen on Windows. When I run this code on a Linux VM, I get the same answer for both threads.

ANSWER: It turns out that the root cause is that floating point state is initialized differently on the main thread than on other threads on Windows in D. See the bug report I just filed.

分享到QQ

分享到微博