NaN 是关联容器的有效键值吗?

发布于 2024-12-14 20:51:44 字数 1077 浏览 3 评论 0原文

考虑 C++ 中以 double 键控的有序和无序关联容器。

NaN 是有效的键类型吗?

对于有序容器,我应该说“不”,因为它不尊重严格的弱排序。

对于无序的容器,我不知道。

以下是 GCC 4.6.2 中发生的情况:

#include <map>
#include <unordered_map>

#include <cmath>

#include <iostream>
#include <prettyprint.hpp>

int main()
{
  typedef std::map<double, int> map_type; // replace by "unorderd_map"

  map_type dm;
  double d = std::acos(5); // a good nan

  dm[d] = 2;
  dm[d] = 5;
  dm[d] = 7;

  std::cout << "dm[NaN] = " << dm[d] << ", dm = " << dm << std::endl;
}

对于有序映射,我得到:

dm[NaN] = 7, dm = [(nan, 7)]

对于无序映射,我得到:

dm[NaN] = 0, dm = [(nan, 0), (nan, 7), (nan, 5), (nan, 2)]

所以在有序映射中,所有 NaN 都被视为相同,这是我所期望的,尽管看起来 NaN 会违反要求。然而,对于无序映射,我永远无法再次检索元素,并且所有 NaN 都是不同的。这也不是我所期望的。

标准是否必须对此事作出任何规定?

更新:感谢下面的精彩回答,请注意,一旦 NaN 插入任何其他内容std::map 就会中断。在其中。

(我将非常感谢有关其他语言如何处理关联容器中的浮点键的评论。)

Consider the ordered and unordered associative containers in C++ keyed on double.

Is NaN a valid key type?

With ordered containers, I should say "no", because it does not respect the strict weak ordering.

With unordered containers, I have no idea.

Here's what happens in GCC 4.6.2:

#include <map>
#include <unordered_map>

#include <cmath>

#include <iostream>
#include <prettyprint.hpp>

int main()
{
  typedef std::map<double, int> map_type; // replace by "unorderd_map"

  map_type dm;
  double d = std::acos(5); // a good nan

  dm[d] = 2;
  dm[d] = 5;
  dm[d] = 7;

  std::cout << "dm[NaN] = " << dm[d] << ", dm = " << dm << std::endl;
}

For the ordered map, I get:

dm[NaN] = 7, dm = [(nan, 7)]

For the unordered map, I get:

dm[NaN] = 0, dm = [(nan, 0), (nan, 7), (nan, 5), (nan, 2)]

So in the ordered map, all NaNs are treated the same, which is what I expect, although it seemed like NaN would violate the requirements. For the unordered map, however, I can never retrieve an element again, and all NaNs are different. This is also not what I would expect.

Does the standard have to say anything on this matter?

Update: Thanks to the great answers below, note that the std::map will break if you insert anything else into it once a NaN is in it.

(I'd be very grateful for comments on how other languages handle floating point keys in associative containers.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

素染倾城色 2024-12-21 20:51:44

它们都是标准所禁止的。

对于(有序)关联容器,严格弱序(25.4/4)的定义如下:

如果我们将 equiv(a, b) 定义为 !comp(a, b) && !comp(b, a),然后
要求是 compequiv 都是传递关系......
等价(a,b)&& equal(b, c) 意味着 equiv(a, c)

对于 a = 0.0, b = NaN, c = 1.0, comp = std::less则失败;()

对于无序容器,23.2.5/3 表示等式谓词 Pred “在类型值上引入等价关系密钥”。等价关系是自反的,并且 std::equal_to()(NaN,NaN) 为 false,因此 equal_to() 不是等价关系。

顺便说一下,在双精度数上设置容器的键值有点可怕,就像比较双精度数是否相等总是有点可怕一样。你永远不知道你会在最不重要的位得到什么。

我一直认为有点奇怪的是,该标准根据键类型来表达要求,而不是根据添加到容器的实际键值。我相信您可以选择将其理解为:如果实现支持 NaN,则不保证 map 已定义行为,无论您是否实际将 NaN 添加到实例。但实际上,std::map 的实现无法以某种方式从其后袋中变出一个 NaN 并尝试比较它,它只比较传递给实例。所以如果你避免添加 NaN 的话应该没问题(如果有点可怕)。

对于其他语言如何处理的评论,我将非常感激
关联容器中的浮点键

Python 中的一些快速实验(其中 setdict 是无序关联容器,通过引用保存键和值)表明 NaN 被视为对象即使它们是“相同的 NaN”,它们的值也不相等,但是可以通过身份再次找到相同的 nan 对象。据我所见,容器似乎不会因包含多个 nan 或 nan 和其他值的混合而受到干扰:

>>> thing = set()
>>> nan = float('nan')
>>> nan
nan
>>> thing.add(nan)
>>> thing.add(nan)
>>> thing
set([nan])

>>> thing = dict()
>>> thing[nan] = 1
>>> thing[nan] = 2
>>> thing[nan]
2
>>> nan2 = float('nan')
>>> thing[nan2] = 3
>>> thing
{nan: 2, nan: 3}

>>> thing = set()
>>> thing.add(nan)
>>> thing.add(nan2)
>>> thing
set([nan, nan])

>>> thing = dict()
>>> thing[nan] = 1
>>> thing[nan2] = 2
>>> thing[0] = 3
>>> thing
{nan: 1, nan: 2, 0: 3}
>>> thing.keys()
[nan, nan, 0]
>>> thing.values()
[1, 2, 3]
>>> thing[0]
3
>>> thing[1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 1

They are both forbidden by the standard.

For the (ordered) associative containers, the definition of strict weak order (25.4/4) says:

If we define equiv(a, b) as !comp(a, b) && !comp(b, a), then the
requirements are that comp and equiv both be transitive relations ...
equiv(a, b) && equiv(b, c) implies equiv(a, c)

This fails for a = 0.0, b = NaN, c = 1.0, comp = std::less<double>()

For the unordered containers, 23.2.5/3 says that the equality predicate Pred "induces an equivalence relation on values of type Key". Equivalence relations are reflexive, and std::equal_to<double>()(NaN,NaN) is false, so equal_to<double>() is not an equivalence relation.

By the way, keying containers on a double is slightly scary in the same way that comparing doubles for equality is always slightly scary. You never know what you're going to get in the least significant bit.

Something I've always considered a little odd is that the standard expresses the requirements in terms of the key type, not in terms of the actual key values added to the container. I believe you could choose to read this as not guaranteeing that map<double, int> has defined behaviour at all if the implementation supports NaNs, regardless of whether you actually add a NaN to an instance or not. In practice, though, an implementation of std::map cannot somehow conjure a NaN out of its back pocket and try to compare it, it only ever compares key values passed to the instance. So it should be OK (if slightly scary) provided you avoid adding NaNs.

I'd be very grateful for comments on how other languages handle
floating point keys in associative containers

A few quick experiments in Python (where set and dict are unordered associative containers which hold keys and values by reference) suggest that NaNs are treated as objects that are unequal in value even if they're "the same NaN", but the same nan object can be found again by identity. As far as I've seen, the containers don't seem to be disturbed by containing multiple nans, or a mixture of nans and other values:

>>> thing = set()
>>> nan = float('nan')
>>> nan
nan
>>> thing.add(nan)
>>> thing.add(nan)
>>> thing
set([nan])

>>> thing = dict()
>>> thing[nan] = 1
>>> thing[nan] = 2
>>> thing[nan]
2
>>> nan2 = float('nan')
>>> thing[nan2] = 3
>>> thing
{nan: 2, nan: 3}

>>> thing = set()
>>> thing.add(nan)
>>> thing.add(nan2)
>>> thing
set([nan, nan])

>>> thing = dict()
>>> thing[nan] = 1
>>> thing[nan2] = 2
>>> thing[0] = 3
>>> thing
{nan: 1, nan: 2, 0: 3}
>>> thing.keys()
[nan, nan, 0]
>>> thing.values()
[1, 2, 3]
>>> thing[0]
3
>>> thing[1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 1
我的影子我的梦 2024-12-21 20:51:44

这是因为

std::less<double>(NaN, NaN) == false

就像你说的,弱总排序(std::map<>所需)是可以的,相等(或等价,任何散列的额外要求基于容器)无法满足哈希(无序)映射的关键要求

Irreflexivity   f(x, x) must be false. 
 Antisymmetry   f(x, y) implies !f(y, x) 
 Transitivity   f(x, y) and f(y, z) imply f(x, z). 
 Transitivity of equivalence     

         Equivalence (as defined above) is transitive: if x 
         is equivalent to y and y is equivalent to z, then x is 
         equivalent to z. (This     implies that equivalence does 
         in fact satisfy the mathematical definition of an equivalence 
         relation.)

看到对于 std::map 来说,等价于 !less(a,b) && !less(b,a),我想说所有约束都得到满足。

This is because

std::less<double>(NaN, NaN) == false

Like you said, the weak total ordering (required for std::map<>) is ok, the equality (or equivalence, extra requirement for any hash-based container) isn't ok to satisfy the key requirements for the hash (unordered) map

Irreflexivity   f(x, x) must be false. 
 Antisymmetry   f(x, y) implies !f(y, x) 
 Transitivity   f(x, y) and f(y, z) imply f(x, z). 
 Transitivity of equivalence     

         Equivalence (as defined above) is transitive: if x 
         is equivalent to y and y is equivalent to z, then x is 
         equivalent to z. (This     implies that equivalence does 
         in fact satisfy the mathematical definition of an equivalence 
         relation.)

Seeing that for std::map, equivalence is when !less(a,b) && !less(b,a), I'd say all constraints are met.

梦纸 2024-12-21 20:51:44

NaN 可以存储在映射内——也就是说,它们是可复制构造的并且不具有可比性。双精度的 std::less 不满足地图对严格弱排序的要求,因此从技术上讲,您在这里遇到了未定义的行为。然而,即使标准没有要求,该行为也很容易解释。 Map 使用等价而不是相等来确定项目是否重复。两个 NaN 比较相等,但不相等。然而,在某些情况下,这种情况会失效。例如,如果您尝试在该映射中插入 NaN 以外的内容,它将被视为与 NaN 等效,并且您将得不到插入。除了 NaN 之外,尝试添加一些实数,您也可以看到映射崩溃了。

哈希行为是预期的,但也没有为哈希表定义 - 哈希表要求其内容可复制构造和相等可比较。多个 NaN 的哈希值比较相等,因此它们都将进入同一个存储桶,但哈希表使用相等比较而不是小于比较(相等而不是等价)。因此,没有一个 NaN 相互比较相等,并且您会对该键进行多次插入。这是因为 NaN 破坏了哈希表的相等可比较要求——即 std::equal_to(x, x) true 的不变量。

NaNs can be stored inside a map -- namely, they are copy constructable and less than comparable. std::less for doubles doesn't meet map's requirements for a strict weak ordering, so you've technically got undefined behavior here. However, the behavior is easily explained, even if not required by the standard. Map uses equivalence rather than equality to determine whether an item is a duplicate. Two NaNs compare equivalent, but not equal. However, that falls apart in some cases. For example, if you tried to insert something other than a NaN into that map, it would be treated as equivalent to the NaN and you'd get no insert. Try adding some real numbers in addition to the NaNs and you can see map breaks down too.

The hash behavior is expected but not defined for a hash table as well -- hash tables require their contents to be copy construcable and equality comparable. The hashes of multiple NaNs compare equal, so they're all going to go into the same bucket, but a hash table uses equality comparison rather than less than comparison (equality rather than equivalence). Therefore, none of the NaNs compare equal to each other and you get multiple inserts for that key. This is because NaN breaks the hash table's equality comparable requirement -- namely the invariant that std::equal_to(x, x) true.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文