发出 NaN 信号有用吗?

发布于 2024-08-20 18:23:11 字数 3419 浏览 5 评论 0原文

我最近阅读了大量有关 IEEE 754 和 x87 架构的文章。我正在考虑在我正在处理的某些数字计算代码中使用 NaN 作为“缺失值”,并且我希望使用 信号 NaN 能够让我在这种情况下捕获浮点异常我不想继续处理“缺失值”。相反,我会使用quiet NaN 来允许“缺失值”通过计算传播。然而,信号 NaN 不起作用,因为我认为它们会基于它们上存在的(非常有限的)文档。

以下是我所知道的摘要(所有这些都使用 x87 和 VC++):

  • _EM_INVALID(IEEE“无效”异常)控制 x87 在遇到 NaN 时的行为
  • 如果 _EM_INVALID 被屏蔽(异常被禁用),则没有异常生成的和操作可以返回安静的 NaN。涉及发信号 NaN 的操作将不会导致抛出异常,但会转换为安静的 NaN。
  • 如果 _EM_INVALID 未被屏蔽(启用异常),则无效操作(例如 sqrt(-1))会导致抛出无效异常。
  • x87 从不生成信号 NaN。
  • 如果 _EM_INVALID 未被屏蔽,任何使用信号 NaN(甚至用它初始化变量)都会导致抛出无效异常。

标准库提供了一种访问 NaN 值的方法:

std::numeric_limits<double>::signaling_NaN();

并且

std::numeric_limits<double>::quiet_NaN();

问题是我认为信号 NaN 没有任何用处。如果 _EM_INVALID 被屏蔽,它的行为与安静的 NaN 完全相同。由于没有一个 NaN 可以与任何其他 NaN 进行比较,因此不存在逻辑差异。

如果 _EM_INVALID 没有被屏蔽(启用异常),则甚至无法使用信号 NaN 初始化变量: double dVal = std::numeric_limits::signaling_NaN(); 因为这会引发异常(信号 NaN 值被加载到 x87 寄存器中以将其存储到内存地址)。

您可能会像我一样想到以下内容:

  1. Mask _EM_INVALID。
  2. 使用 NaN 信号初始化变量。
  3. 取消屏蔽_EM_INVALID。

但是,步骤 2 会导致信号 NaN 转换为安静 NaN,因此后续使用它将不会导致抛出异常!那么WTF?!

信号 NaN 有任何效用或目的吗?我知道最初的意图之一是用它初始化内存,以便可以捕获未初始化的浮点值的使用。

有人可以告诉我我是否在这里遗漏了什么吗?


编辑:

为了进一步说明我希望做什么,这里有一个示例:

考虑对数据向量(双精度)执行数学运算。对于某些操作,我希望允许向量包含“缺失值”(假设这对应于电子表格列,例如,其中某些单元格没有值,但它们的存在很重要)。对于某些操作,我希望允许向量包含“缺失值”。如果集合中存在“缺失值”,也许我想采取不同的行动方案——也许执行不同的操作(因此这不是无效的状态)。

原始代码如下所示:

const double MISSING_VALUE = 1.3579246e123;
using std::vector;

vector<double> missingAllowed(1000000, MISSING_VALUE);
vector<double> missingNotAllowed(1000000, MISSING_VALUE);

// ... populate missingAllowed and missingNotAllowed with (user) data...

for (vector<double>::iterator it = missingAllowed.begin(); it != missingAllowed.end(); ++it) {
    if (*it != MISSING_VALUE) *it = sqrt(*it); // sqrt() could be any operation
}

for (vector<double>::iterator it = missingNotAllowed.begin(); it != missingNotAllowed.end(); ++it) {
    if (*it != MISSING_VALUE) *it = sqrt(*it);
    else *it = 0;
}

请注意,必须每次循环迭代执行“缺失值”检查。虽然我理解在大多数情况下,sqrt 函数(或任何其他数学运算)可能会掩盖此检查,但在某些情况下,操作很少(可能只是添加)并且检查成本很高。更不用说“缺失值”会使合法的输入值失效,并且如果计算合法地达到该值(尽管不太可能),则可能会导致错误。此外,为了在技术上正确,应根据该值检查用户输入数据,并应采取适当的措施。我发现这个解决方案不优雅且性能不佳。这是性能关键的代码,我们绝对没有并行数据结构或某种类型的数据元素对象的奢侈。

NaN 版本将如下所示:

using std::vector;

vector<double> missingAllowed(1000000, std::numeric_limits<double>::quiet_NaN());
vector<double> missingNotAllowed(1000000, std::numeric_limits<double>::signaling_NaN());

// ... populate missingAllowed and missingNotAllowed with (user) data...

for (vector<double>::iterator it = missingAllowed.begin(); it != missingAllowed.end(); ++it) {
    *it = sqrt(*it); // if *it == QNaN then sqrt(*it) == QNaN
}

for (vector<double>::iterator it = missingNotAllowed.begin(); it != missingNotAllowed.end(); ++it) {
    try {
        *it = sqrt(*it);
    } catch (FPInvalidException&) { // assuming _seh_translator set up
        *it = 0;
    }
}

现在消除了显式检查,并且性能应该得到改进。我认为如果我可以在不接触 FPU 寄存器的情况下初始化向量,这一切都会起作用......

此外,我会想象任何有自尊的 sqrt 实现都会检查 NaN 并立即返回 NaN。

I've recently read up quite a bit on IEEE 754 and the x87 architecture. I was thinking of using NaN as a "missing value" in some numeric calculation code I'm working on, and I was hoping that using signaling NaN would allow me to catch a floating point exception in the cases where I don't want to proceed with "missing values." Conversely, I would use quiet NaN to allow the "missing value" to propagate through a computation. However, signaling NaNs don't work as I thought they would based on the (very limited) documentation that exists on them.

Here is a summary of what I know (all of this using x87 and VC++):

  • _EM_INVALID (the IEEE "invalid" exception) controls the behavior of the x87 when encountering NaNs
  • If _EM_INVALID is masked (the exception is disabled), no exception is generated and operations can return quiet NaN. An operation involving signaling NaN will not cause an exception to be thrown, but will be converted to quiet NaN.
  • If _EM_INVALID is unmasked (exception enabled), an invalid operation (e.g., sqrt(-1)) causes an invalid exception to be thrown.
  • The x87 never generates signaling NaN.
  • If _EM_INVALID is unmasked, any use of a signaling NaN (even initializing a variable with it) causes an invalid exception to be thrown.

The Standard Library provides a way to access the NaN values:

std::numeric_limits<double>::signaling_NaN();

and

std::numeric_limits<double>::quiet_NaN();

The problem is that I see no use whatsoever for the signaling NaN. If _EM_INVALID is masked it behaves exactly the same as quiet NaN. Since no NaN is comparable to any other NaN, there is no logical difference.

If _EM_INVALID is not masked (exception is enabled), then one cannot even initialize a variable with a signaling NaN:
double dVal = std::numeric_limits<double>::signaling_NaN(); because this throws an exception (the signaling NaN value is loaded into an x87 register to store it to the memory address).

You may think the following as I did:

  1. Mask _EM_INVALID.
  2. Initialize the variable with signaling NaN.
  3. Unmask_EM_INVALID.

However, step 2 causes the signaling NaN to be converted to a quiet NaN, so subsequent uses of it will not cause exceptions to be thrown! So WTF?!

Is there any utility or purpose whatsoever to a signaling NaN? I understand one of the original intents was to initialize memory with it so that use of an unitialized floating point value could be caught.

Can someone tell me if I am missing something here?


EDIT:

To further illustrate what I had hoped to do, here is an example:

Consider performing mathematical operations on a vector of data (doubles). For some operations, I want to allow the vector to contain a "missing value" (pretend this corresponds to a spreadsheet column, for example, in which some of the cells do not have a value, but their existence is significant). For some operations, I do not want to allow the vector to contain a "missing value." Perhaps I want to take a different course of action if a "missing value" is present in the set -- perhaps performing a different operation (thus this is not an invalid state to be in).

This original code would look something like this:

const double MISSING_VALUE = 1.3579246e123;
using std::vector;

vector<double> missingAllowed(1000000, MISSING_VALUE);
vector<double> missingNotAllowed(1000000, MISSING_VALUE);

// ... populate missingAllowed and missingNotAllowed with (user) data...

for (vector<double>::iterator it = missingAllowed.begin(); it != missingAllowed.end(); ++it) {
    if (*it != MISSING_VALUE) *it = sqrt(*it); // sqrt() could be any operation
}

for (vector<double>::iterator it = missingNotAllowed.begin(); it != missingNotAllowed.end(); ++it) {
    if (*it != MISSING_VALUE) *it = sqrt(*it);
    else *it = 0;
}

Note that the check for the "missing value" must be performed every loop iteration. While I understand in most cases, the sqrt function (or any other mathematical operation) will likely overshadow this check, there are cases where the operation is minimal (perhaps just an addition) and the check is costly. Not to mention the fact that the "missing value" takes a legal input value out of play and could cause bugs if a calculation legitimately arrives at that value (unlikely though it may be). Also to be technically correct, the user input data should be checked against that value and an appropriate course of action should be taken. I find this solution inelegant and less-than-optimal performance-wise. This is performance-critical code, and we definitely do not have the luxury of parallel data structures or data element objects of some sort.

The NaN version would look like this:

using std::vector;

vector<double> missingAllowed(1000000, std::numeric_limits<double>::quiet_NaN());
vector<double> missingNotAllowed(1000000, std::numeric_limits<double>::signaling_NaN());

// ... populate missingAllowed and missingNotAllowed with (user) data...

for (vector<double>::iterator it = missingAllowed.begin(); it != missingAllowed.end(); ++it) {
    *it = sqrt(*it); // if *it == QNaN then sqrt(*it) == QNaN
}

for (vector<double>::iterator it = missingNotAllowed.begin(); it != missingNotAllowed.end(); ++it) {
    try {
        *it = sqrt(*it);
    } catch (FPInvalidException&) { // assuming _seh_translator set up
        *it = 0;
    }
}

Now the explicit check is eliminated and performance should be improved. I think this would all work if I could initialize the vector without touching the FPU registers...

Furthermore, I would imagine any self-respecting sqrt implementation checks for NaN and returns NaN immediately.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

絕版丫頭 2024-08-27 18:23:11

据我了解,发出 NaN 信号的目的是初始化数据结构,但是,当然,C 中的运行时初始化存在将 NaN 作为初始化的一部分加载到浮点寄存器中的风险,从而触发信号,因为编译器不知道需要使用整数寄存器复制该浮点值。

我希望您可以使用信号 NaN 初始化静态值,但即使这样也需要编译器进行一些特殊处理,以避免将其转换为安静的 NaN。您也许可以使用一些转换魔法来避免在初始化期间将其视为浮点值。

如果您使用 ASM 编写,这将不是问题。但在 C 中,尤其是在 C++ 中,我认为您必须颠覆类型系统才能使用 NaN 初始化变量。我建议使用memcpy

As I understand it, the purpose of signaling NaN is to initialize data structures, but, of course runtime initialization in C runs the risk of having the NaN loaded into a float register as part of initialization, thereby triggering the signal because the the compiler isn't aware that this float value needs to be copied using an integer register.

I would hope that you could could initialize a static value with a signaling NaN, but even that would require some special handling by the compiler to avoid having it converted to a quiet NaN. You could perhaps use a bit of casting magic to avoid having it treated as a float value during initialization.

If you were writing in ASM, this would not be an issue. but in C and especially in C++, I think you will have to subvert the type system in order to initialize a variable with NaN. I suggest using memcpy.

月牙弯弯 2024-08-27 18:23:11

以下是不同双 NaN 的位模式:

信号 NaN 由任何位模式表示
7FF0000000000001 和 7FF7FFFFFFFFFFFF 之间或
FFF0000000000001 和 FFF7FFFFFFFFFFFF 之间

安静的 NaN 由任何位模式表示
7FF8000000000000 和 7FFFFFFFFFFFFFFF 之间或
介于 FFF8000000000000 和 FFFFFFFFFFFFFFFF 之间

来源:https://www.doc .ic.ac.uk/~eedwards/compsys/float/nan.html

免责声明:正如其他人指出的那样,施展魔法具有潜在危险,并可能导致未定义的行为。有人建议使用 memcpy 作为更安全的替代方案。

话虽这么说,出于学术目的,或者如果您知道它在预期的硬件上是安全的:

理论上,似乎只要有一个 const uint64_t 就可以工作,其中的位已设置为信令 nan 的位。只要将其视为整数类型,信号 nan 与其他整数没有什么不同。然后,除非架构特殊情况问题,也许您可​​以通过指针转换将其编写在您想要的地方。如果它按预期工作,它甚至可能比 memcpy 更快。对于某些嵌入式系统,它甚至可能有用。

例子:

const uint64_t sNan = 0xFFF7FFFFFFFFFFFF;
double[] myData;
...
uint64_t* copier = (uint64_t*) &myData[index];
*copier = sNan & ~myErrorFlags;

Here are the bit-patterns of the different double NaNs:

A signalling NaN is represented by any bit pattern
between 7FF0000000000001 and 7FF7FFFFFFFFFFFF or
between FFF0000000000001 and FFF7FFFFFFFFFFFF

A quiet NaN is represented by any bit pattern
between 7FF8000000000000 and 7FFFFFFFFFFFFFFF or
between FFF8000000000000 and FFFFFFFFFFFFFFFF

Source: https://www.doc.ic.ac.uk/~eedwards/compsys/float/nan.html

Disclaimer: As others have pointed out, casting magic is potentially dangerous and may cause undefined behavior. Using memcpy has been suggested as a safer alternative.

That being said, for academical purposes, or if you know it is safe on the intended hardware:

In theory, it seems like it should work to just have a const uint64_t where the bits have been set to those of a signaling nan. As long as you treat it as an integer type, the signaling nan is not different from other integers. Then, barring architectural special case issues, maybe you could write it where you want through pointer-casting. If it works as intended, it might even be faster than memcpy. For some embedded systems it might even be useful.

Example:

const uint64_t sNan = 0xFFF7FFFFFFFFFFFF;
double[] myData;
...
uint64_t* copier = (uint64_t*) &myData[index];
*copier = sNan & ~myErrorFlags;
我的痛♀有谁懂 2024-08-27 18:23:11

使用特殊值(甚至 NULL)会使您的数据变得更加混乱,并且您的代码变得更加混乱。无法区分 QNaN 结果和 QNaN“特殊”值。

您可能最好维护一个并行数据结构来跟踪有效性,或者可能将 FP 数据放在不同的(稀疏)数据结构中以仅保留有效数据。

这是相当一般的建议;特殊值在某些情况下非常有用(例如,内存非常紧张或性能限制),但随着上下文变大,它们可能会导致比其价值更多的困难。

Using special values (even NULL) can make your data a lot muddier and your code a lot messier. It would be impossible to distinguish between a QNaN result and a QNaN "special" value.

You might be better maintaining a parallel data structure to track validity, or perhaps having your FP data in a different (sparse) data structure to only keep valid data.

This is fairly general advice; special values are very useful in certain cases (e.g. really tight memory or performance constraints), but as the context grows larger they can cause more difficulty than they're worth.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文