并行性:浮点结果略有不同?

发布于 2024-11-01 20:44:37 字数 1731 浏览 1 评论 0原文

我正在尝试调试 D 编程语言的并行库。最近提交的错误报告表明某些浮动的低位使用任务执行的点操作在运行中是不确定的。 (如果您阅读该报告,请注意并行减少通过以确定性方式创建任务来在幕后工作。)

这似乎不是舍入模式问题,因为我尝试手动设置舍入模式。我也很确定这不是并发错误。该库经过了良好的测试(包括通过 Jinx 压力测试),问题始终局限于低端顺序位,甚至在单核机器上也会发生这种情况,在单核机器上,低级内存模型问题不是什么问题。根据调度操作的线程的不同,浮点结果可能会有所不同,还有哪些其他原因?

编辑:我在这里进行一些 printf 调试,似乎各个任务的结果有时在运行中会有所不同。

编辑#2:以下代码以更简单的方式重现此问题。它在主线程中对数组的项求和,然后启动一个新线程来执行完全相同的函数。这个问题绝对不是我的库中的错误,因为这段代码甚至没有使用我的库。

import std.algorithm, core.thread, std.stdio, core.stdc.fenv;

real sumRange(const(real)[] range) {
    writeln("Rounding mode:  ", fegetround);  // 0 from both threads.
    return reduce!"a + b"(range);
}

void main() {
    immutable n = 1_000_000;
    immutable delta = 1.0 / n;

    auto terms = new real[1_000_000];
    foreach(i, ref term; terms) {
        immutable x = ( i - 0.5 ) * delta;
        term = delta / ( 1.0 + x * x ) * 1;
    }

    immutable res1 = sumRange(terms);
    writefln("%.19f", res1);

    real res2;
    auto t = new Thread( { res2 = sumRange(terms); } );
    t.start();
    t.join();
    writefln("%.19f", res2);
}

输出:

舍入模式:0

0.7853986633972191094

舍入模式:0

0.7853986633972437348

另一个编辑

这是我以十六进制打印时的输出:

舍入模式:0

0x1.921fc60b39f1331cp-1

舍入模式:0

0x1.921fc60b39ff1p-1

另外,这似乎只发生在 Windows 上。当我在 Linux VM 上运行此代码时,两个线程都得到相同的答案。

答案:事实证明,根本原因是 D 中主线程上的浮点状态初始化方式与 Windows 上的其他线程上的初始化方式不同。请参阅 我刚刚提交的错误报告。

I'm trying to debug my parallelism library for the D programming language. A bug report was recently filed that indicates that the low-order bits of some floating point operations that are performed using tasks are non-deterministic across runs. (If you read the report, note that parallel reduce works under the hood by creating tasks in a deterministic way.)

This doesn't appear to be a rounding mode issue, because I tried setting the rounding mode manually. I'm also pretty sure this is not a concurrency bug. The library is well-tested (including passing a Jinx stress test), the issue is always confined to the low-order bits, and it happens even on single-core machines, where low-level memory model issues are less of a problem. What are some other reasons why floating point results might differ depending on what thread the operations are scheduled on?

Edit: I'm doing some printf debugging here and it seems like the results for the individual tasks are sometimes different across runs.

Edit # 2: The following code reproduces this issue in a much simpler way. It sums the terms of an array in the main thread, then starts a new thread to execute the exact same function. The problem is definitely not a bug in my library, because this code doesn't even use my library.

import std.algorithm, core.thread, std.stdio, core.stdc.fenv;

real sumRange(const(real)[] range) {
    writeln("Rounding mode:  ", fegetround);  // 0 from both threads.
    return reduce!"a + b"(range);
}

void main() {
    immutable n = 1_000_000;
    immutable delta = 1.0 / n;

    auto terms = new real[1_000_000];
    foreach(i, ref term; terms) {
        immutable x = ( i - 0.5 ) * delta;
        term = delta / ( 1.0 + x * x ) * 1;
    }

    immutable res1 = sumRange(terms);
    writefln("%.19f", res1);

    real res2;
    auto t = new Thread( { res2 = sumRange(terms); } );
    t.start();
    t.join();
    writefln("%.19f", res2);
}

Output:

Rounding mode: 0

0.7853986633972191094

Rounding mode: 0

0.7853986633972437348

Another Edit

Here's the output when I print in hex instead:

Rounding mode: 0

0x1.921fc60b39f1331cp-1

Rounding mode: 0

0x1.921fc60b39ff1p-1

Also, this only seems to happen on Windows. When I run this code on a Linux VM, I get the same answer for both threads.

ANSWER: It turns out that the root cause is that floating point state is initialized differently on the main thread than on other threads on Windows in D. See the bug report I just filed.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

巡山小妖精 2024-11-08 20:44:37

这是解释的论文 相同的 C 代码可能导致结果略有不同的原因有很多。在您的情况下,最可能的原因是 CPU 内部指令重新排序。

期望浮点计算对于低位具有确定性是完全错误的。这并不是浮点数的设计目的。

Here's a paper that explains the many reasons the same C code can lead to slightly different results. In your case, the most likely reason is CPU-internal instruction reordering.

It's simply wrong to expect floating-point calculations to be deterministic down to the low-order bits. That's not what floating-point numbers were designed to fulfill.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文