这个单精度运算的结果是如何舍入的? [或者为什么这个位是1而不是0?]

发布于 2024-12-12 05:42:21 字数 1866 浏览 0 评论 0原文

我正在研究一个函数优化例程(Nelder-Mead 算法的变体),该例程在非常特定的条件下无法收敛。

我发现一个 float 变量(我们称之为 a)被分配了 a 和另一个变量 b< 之间的平均值/code> 与它仅略有不同。

更准确地说,每个变量的值如下:

float a = 25.9735966f; // 41CFC9ED
float b = 25.9735947f; // 41CFC9EC

现在我尝试将 ab 之间的平均值分配给 a

a = 0.5 * (a+b);

当我在测试程序中编写此代码时,我得到了我想要的结果,即 25.9735947。但在我的原始库代码的调试器中,我看到 a 的值仍然是 25.9735966。我非常确定这两个程序具有相同的编译器标志。 是否有任何原因导致这种单精度计算会产生不同的结果?

更新

正如@PascalCuoq 所要求的,我认为这是相关行的程序集。该线正在做一些其他事情,但我不确定乘法发生在哪里。

.loc 1 53 0 discriminator 2
movl    -60(%rbp), %eax
cltq
salq    $3, %rax
addq    -88(%rbp), %rax
movq    (%rax), %rax
movl    -44(%rbp), %edx
movslq  %edx, %rdx
salq    $2, %rdx
leaq    (%rax,%rdx), %rcx
movl    -44(%rbp), %eax
cltq
salq    $2, %rax
addq    -72(%rbp), %rax
movl    -60(%rbp), %edx
movslq  %edx, %rdx
salq    $3, %rdx
addq    -88(%rbp), %rdx
movq    (%rdx), %rdx
movl    -44(%rbp), %esi
movslq  %esi, %rsi
salq    $2, %rsi
addq    %rsi, %rdx
movss   (%rdx), %xmm1
movl    -52(%rbp), %edx
movslq  %edx, %rdx
salq    $3, %rdx
addq    -88(%rbp), %rdx
movq    (%rdx), %rdx
movl    -44(%rbp), %esi
movslq  %esi, %rsi
salq    $2, %rsi
addq    %rsi, %rdx
movss   (%rdx), %xmm0
addss   %xmm1, %xmm0
movss   .LC6(%rip), %xmm1
mulss   %xmm1, %xmm0
movss   %xmm0, (%rax)
movl    (%rax), %eax
movl    %eax, (%rcx)

澄清

我的代码是 Numerical Recipes 中 Nelder-Mead 代码的 ripoff 变体。有问题的行是这一行:

p[i][j]=psum[j]=0.5*(p[i][j]+p[ilo][j]);

在这一行中,p[i][j] == 25.9735966fp[ilo][j] == 25.9735947fp[i][j] 中的结果值为 25.9735966f

I'm working on a function optimization routine (a variant of the Nelder-Mead algorithm) which fails to converge in very specific conditions.

I've identified that a float variable, let's call it a, is being assigned the mean between a and another variable b that differs from it by a bit only.

More precisely, the values of each variables are as follows:

float a = 25.9735966f; // 41CFC9ED
float b = 25.9735947f; // 41CFC9EC

And now I'm trying to assign to a the mean between a and b:

a = 0.5 * (a+b);

When I write this code in a test program, I get the result I want, namely 25.9735947. But in the debugger of my original library code I see that the value of a remains 25.9735966. I'm pretty certain that I have the same compiler flags on both programs. Is there any reason why this single-precision calculation would yield different results?

UPDATE

As @PascalCuoq requested, here is what I think is the assembly for the line in question. The line is doing a few other things though and I'm not sure where the multiplication happens.

.loc 1 53 0 discriminator 2
movl    -60(%rbp), %eax
cltq
salq    $3, %rax
addq    -88(%rbp), %rax
movq    (%rax), %rax
movl    -44(%rbp), %edx
movslq  %edx, %rdx
salq    $2, %rdx
leaq    (%rax,%rdx), %rcx
movl    -44(%rbp), %eax
cltq
salq    $2, %rax
addq    -72(%rbp), %rax
movl    -60(%rbp), %edx
movslq  %edx, %rdx
salq    $3, %rdx
addq    -88(%rbp), %rdx
movq    (%rdx), %rdx
movl    -44(%rbp), %esi
movslq  %esi, %rsi
salq    $2, %rsi
addq    %rsi, %rdx
movss   (%rdx), %xmm1
movl    -52(%rbp), %edx
movslq  %edx, %rdx
salq    $3, %rdx
addq    -88(%rbp), %rdx
movq    (%rdx), %rdx
movl    -44(%rbp), %esi
movslq  %esi, %rsi
salq    $2, %rsi
addq    %rsi, %rdx
movss   (%rdx), %xmm0
addss   %xmm1, %xmm0
movss   .LC6(%rip), %xmm1
mulss   %xmm1, %xmm0
movss   %xmm0, (%rax)
movl    (%rax), %eax
movl    %eax, (%rcx)

CLARIFICATION

My code is a ripoff variant of the Nelder-Mead code from Numerical Recipes. The offending line is this one:

p[i][j]=psum[j]=0.5*(p[i][j]+p[ilo][j]);

In this line, p[i][j] == 25.9735966f and p[ilo][j] == 25.9735947f. The resulting value in p[i][j] is 25.9735966f.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

枯寂 2024-12-19 05:42:21

我刚刚重新阅读了 IEEE 754-1985 的相关部分,假设您的浮点实现符合该标准。唯一想到的是两个环境中有不同的舍入模式。这些是可能性:

  • 舍入到最接近的,并且在距离相等的情况下:将最低有效位设置为零=> 25.9735947f
  • +INF => 舍入25.9735966f
  • 0 舍入 => 25.9735947f
  • -INF 舍入 => 25.9735947f

因此,唯一的可能是您的调试环境具有向 +INF 舍入模式。对我来说,没有其他合理的解释。

I just re-read the relevant part of IEEE 754-1985, assuming that your floating-point implementation conforms to that standard. The only thing that comes to mind is that there are different rounding modes in your two environments. These are the possibilities:

  • round to nearest, and in case of equal distance: set the least significant bit to zero => 25.9735947f
  • round towards +INF => 25.9735966f
  • round towards 0 => 25.9735947f
  • round towards -INF => 25.9735947f

So the only possibility is that your debugging environment has rounding mode towards +INF. To me, there is no other plausible explanation.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文