当前位置：文江博客话题详情

Spidermonkey 引擎中 JS_CANONICALIZE_NAN 的目的是什么？

发布于 2025-01-08 17:58:00 字数 53 浏览 1 评论 0原文

我想知道 JS_CANONICALIZE_NAN 的目的是什么以及所有平台上是否始终需要它？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

北方的巷 2025-01-15 17:58:00

这很有趣！因此，SpiderMonkey 在内部使用标记值表示来表示 JavScript 的“无类型值”——这允许 VM 确定诸如“存储在 a 中的变量是一个数字，以及存储在 a 中的值”之类的事情。 >b 是一个数字，因此运行 a + b 会进行数值加法”。

有许多不同的值标记方案，SpiderMonkey 使用一种称为“NaN 装箱”的方案。这意味着引擎中的所有无类型值都由 64 位值表示，这些值可以是：

双精度型，或
位于 IEEE 双精度浮点值的“NaN 空间”中的带标记的非双精度型。

这里真正的技巧是，现代系统通常使用单个位模式来表示 NaN，您可以将其观察为 math.h 的 sqrt(-1) 或 log(0) 的结果。但根据 IEEE 浮点规范，有很多位模式也被视为 NaN。

double 由子字段组成：

{sign: 1, exponent: 11, significand: 52}

NaN 通过用 1 填充指数字段并在有效数字中放置一个非零值来表示。

如果您运行这样的小程序来查看平台的 NaN 值：

#include <stdio.h>
#include <math.h>
#include <limits>

static unsigned long long 
DoubleAsULL(double d) {
    return *((unsigned long long *) &d);
}

int main() {
    double sqrtNaN = sqrt(-1);
    printf("%5f 0x%llx\n", sqrtNaN, DoubleAsULL(sqrtNaN));
    double logNaN = log(-1);
    printf("%5f 0x%llx\n", logNaN, DoubleAsULL(logNaN));
    double compilerNaN = NAN;
    printf("%5f 0x%llx\n", compilerNaN, DoubleAsULL(compilerNaN));
    double compilerSNAN = std::numeric_limits<double>::signaling_NaN();
    printf("%5f 0x%llx\n", compilerSNAN, DoubleAsULL(compilerSNAN));
    return 0;
}

您将看到如下输出：

 -nan 0xfff8000000000000 // Canonical qNaNs...
  nan 0x7ff8000000000000
  nan 0x7ff8000000000000
  nan 0x7ff4000000000000 // sNaN (signaling)

请注意，安静 NaN 的唯一区别在于符号位，始终后面跟着 12 位 1，满足 NaN 要求上面提到过。最后一个是信号 NaN，它清除第 12 个 (is_quiet) NaN 位，并使第 13 个 NaN 保持上述 NaN 不变。

除此之外，NaN 空间可以自由发挥——用 11 位来填充指数，确保尾数非零，这样你就剩下了很多空间。在 x64 上，我们使用 47 位虚拟地址假设，这使得我们有 64 - 47 - 11 = 6 位用于注释值类型。在 x86 上，所有对象指针都适合低 32 位。

然而，我们仍然需要确保非规范的 NaN，如果它们通过 js-ctypes 之类的东西渗透进来，不会产生看起来像标记的非双精度值的东西，因为这可能会导致虚拟机中的可利用行为。（将数字视为对象是非常糟糕的消息。）因此，当我们形成双精度数（如 DOUBLE_TO_JSVAL 中）时，我们确保将所有 d != d 的双精度数规范化为规范的 NaN形式。

更多信息请参见错误 584168。

This is a fun one! So, SpiderMonkey internally uses a tagged value representation to represent JavScript's "untyped values" -- this allows the VM to determine things like "the variable stored in a is an number, and the value stored in b is an number, so running a + b does numerical addition".

There are a bunch of different schemes for value tagging, and SpiderMonkey uses one that's referred to as "NaN boxing". This means that all untyped values in the engine are represented by 64 bit values that can either be:

a double, or
a tagged non-double that lives in the "NaN space" of IEEE double-precision floating point values.

The real trick here is that modern systems use generally use a single bit pattern to represent NaN, which you can observe as the result of math.h's sqrt(-1) or log(0). but there are a lot of bit patterns which are also considered NaNs according to the IEEE floating point spec.

A double is composed of the sub-fields:

{sign: 1, exponent: 11, significand: 52}

NaNs are represented by filling the exponent field with 1s and placing a non-zero value in the significand.

If you run a little program like this to see your platform's NaN values:

#include <stdio.h>
#include <math.h>
#include <limits>

static unsigned long long 
DoubleAsULL(double d) {
    return *((unsigned long long *) &d);
}

int main() {
    double sqrtNaN = sqrt(-1);
    printf("%5f 0x%llx\n", sqrtNaN, DoubleAsULL(sqrtNaN));
    double logNaN = log(-1);
    printf("%5f 0x%llx\n", logNaN, DoubleAsULL(logNaN));
    double compilerNaN = NAN;
    printf("%5f 0x%llx\n", compilerNaN, DoubleAsULL(compilerNaN));
    double compilerSNAN = std::numeric_limits<double>::signaling_NaN();
    printf("%5f 0x%llx\n", compilerSNAN, DoubleAsULL(compilerSNAN));
    return 0;
}

You'll see output like this:

 -nan 0xfff8000000000000 // Canonical qNaNs...
  nan 0x7ff8000000000000
  nan 0x7ff8000000000000
  nan 0x7ff4000000000000 // sNaN (signaling)

Note that the only difference for the quiet NaNs is in the sign bit, always followed by 12 bits of 1s, satisfying the NaN requirement mentioned above. The last one, a signaling NaN, clears out the 12th (is_quiet) NaN bit and enables the 13th to keep the NaN invariant mentioned above.

Other than that, the NaN space is free to play in -- 11 bits to fill in the exponent, make sure the signficand is non-zero, and you've got a lot of space left. On x64 we use a 47 bit virtual address assumption, which leaves us 64 - 47 - 11 = 6 bits for annotating value types. On x86 all object pointers fit in the lower 32 bits.

However, we still need to make sure that non-canonical NaNs, if they creep in through something like js-ctypes, don't produce something like looks like tagged non-double values, because that could lead to exploitable behavior in the VM. (Treating numbers as objects is very-much-so bad news bears.) So, when we form doubles (like in DOUBLE_TO_JSVAL) we make sure to canonicalize all doubles where d != d to the canonical NaN form.

More info is in bug 584168.

回复收藏 0 原文

~没有更多了~