最终的、非规范的 NaN 双精度值在运行时发生变化

发布于 2024-11-15 21:01:19 字数 1390 浏览 3 评论 0原文

我正在编写与 R 交互的 Java 代码,其中“NA”值与 NaN 值不同。 NA 表示某个值“统计缺失”,即该值无法收集或不可用。

class DoubleVector {
     public static final double NA = Double.longBitsToDouble(0x7ff0000000001954L);

     public static boolean isNA(double input) {
         return Double.doubleToRawLongBits(input) == Double.doubleToRawLongBits(NA);
     }

     /// ... 
}

以下单元测试演示了 NaN 和 NA 之间的关系,并且在我的 Windows 笔记本电脑上运行良好,但“isNA(NA) #2”在我的 ubuntu 工作站上有时会失败。

@Test
public void test() {

    assertFalse("isNA(NaN) #1", DoubleVector.isNA(DoubleVector.NaN));
    assertTrue("isNaN(NaN)", Double.isNaN(DoubleVector.NaN));
    assertTrue("isNaN(NA)", Double.isNaN(DoubleVector.NA));
    assertTrue("isNA(NA) #2", DoubleVector.isNA(DoubleVector.NA));
    assertFalse("isNA(NaN)", DoubleVector.isNA(DoubleVector.NaN));
}

从调试来看,DoubleVector.NA 似乎已更改为规范的 NaN 值 7ff8000000000000L,但很难判断,因为将其打印到 stdout 会给出与调试器不同的值。

此外,只有在之前进行了许多其他测试之后才运行测试,该测试才会失败。如果我单独运行这个测试,它总是会通过。

这是 JVM 错误吗?优化的副作用?

测试总是通过:

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) Client VM (build 19.1-b02, mixed mode, sharing)

测试有时会失败:

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

I am writing Java code that interacts with R, where "NA" values are distinguished from NaN values. NA indicates that a value is "statistically missing", that is it could not collected or is otherwise not available.

class DoubleVector {
     public static final double NA = Double.longBitsToDouble(0x7ff0000000001954L);

     public static boolean isNA(double input) {
         return Double.doubleToRawLongBits(input) == Double.doubleToRawLongBits(NA);
     }

     /// ... 
}

The following unit test demonstrates the relationship between NaN and NA and runs fine on my windows laptop but "isNA(NA) #2" fails sometimes on my ubuntu workstation.

@Test
public void test() {

    assertFalse("isNA(NaN) #1", DoubleVector.isNA(DoubleVector.NaN));
    assertTrue("isNaN(NaN)", Double.isNaN(DoubleVector.NaN));
    assertTrue("isNaN(NA)", Double.isNaN(DoubleVector.NA));
    assertTrue("isNA(NA) #2", DoubleVector.isNA(DoubleVector.NA));
    assertFalse("isNA(NaN)", DoubleVector.isNA(DoubleVector.NaN));
}

From debugging, it appears that DoubleVector.NA is changed to the canonical NaN value 7ff8000000000000L, but it's hard to tell because printing it to stdout gives different values than the debugger.

Also, the test only fails if it runs after a number of other previous tests; if I run this test alone, it always passes.

Is this a JVM bug? A side effect of optimization?

Tests always pass on:

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) Client VM (build 19.1-b02, mixed mode, sharing)

Tests sometimes fail on:

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

_蜘蛛 2024-11-22 21:01:19

您在这里正踏入非常危险的水域,这是 Java VM 行为未准确指定的少数领域之一。

根据 JVM 规范,double 范围内只有“NaN 值”。双精度数上的算术运算无法区分两个不同的 NaN 值。

longBitsToDouble 的文档() 有这样的注释:

请注意,此方法可能无法返回与 long 参数具有完全相同位模式的 double NaN。 IEEE 754 区分两种 NaN:安静 NaN 和信号 NaN。这两种 NaN 之间的差异在 Java 中通常不明显。对信号 NaN 进行算术运算将其转换为具有不同但通常相似的位模式的安静 NaN。然而,在某些处理器上,仅复制信号 NaN 也会执行该转换。特别是,复制信号 NaN 以将其返回到调用方法可以执行此转换。因此,longBitsToDouble 可能无法返回具有信号 NaN 位模式的双精度值。因此,对于某些长值,doubleToRawLongBits(longBitsToDouble(start)) 可能不等于start。此外,哪些特定的位模式表示信号 NaN 是依赖于平台的;尽管所有 NaN 位模式(安静或信令)都必须在上面确定的 NaN 范围内。

因此,假设处理 double 值将始终保持特定 NaN 值完好无损,这是一件危险的事情。

最干净的解决方案是将数据存储在long中,并在检查特殊值后将其转换为double。然而,这将对性能产生相当明显的影响。

可能可以通过在受影响的位置添加strictfp 标志来逃脱。这并不能以任何方式保证它会工作,但它(可能)会改变你的JVM处理浮点值的方式,可能只是必要的提示,可以帮助。然而,它仍然不便于携带。

You are treading in very dangerous water here, one of the few areas where the Java VM behaviour is not exactly specified.

According to the JVM spec, there is only "a NaN value" in the double range. No arithmetic operation on doubles could distinguish between two different NaN values.

The documentation of longBitsToDouble() has this note:

Note that this method may not be able to return a double NaN with exactly same bit pattern as the long argument. IEEE 754 distinguishes between two kinds of NaNs, quiet NaNs and signaling NaNs. The differences between the two kinds of NaN are generally not visible in Java. Arithmetic operations on signaling NaNs turn them into quiet NaNs with a different, but often similar, bit pattern. However, on some processors merely copying a signaling NaN also performs that conversion. In particular, copying a signaling NaN to return it to the calling method may perform this conversion. So longBitsToDouble may not be able to return a double with a signaling NaN bit pattern. Consequently, for some long values, doubleToRawLongBits(longBitsToDouble(start)) may not equal start. Moreover, which particular bit patterns represent signaling NaNs is platform dependent; although all NaN bit patterns, quiet or signaling, must be in the NaN range identified above.

So assuming that handling a double value will always keep the specific NaN value intact is a dangerous thing.

The cleanest solution would be to store your data in long and convert to double after checking for your special value. This will impose a quite noticeable performance impact, however.

You might get away by adding the strictfp flag at the affected places. This doesn't in any way guarantee that it will work, but it will (possibly) change how your JVM handles floating point values and might just be the necessary hint that helps. It will still not be portable, however.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文