有没有办法用 epsilon 获取浮点数的哈希码?

发布于 2024-07-14 06:13:53 字数 923 浏览 7 评论 0原文

众所周知,通过 == 比较浮点数通常是错误的。 在我编写的 3D 向量类(具有浮点分量 X、Y、Z)中,如果两个向量的距离被视为零,则它们被视为相等。

public override bool Equals(object obj)
{
    if (obj == null) {
        return false;
    }

    if (GetType () != obj.GetType ()) {
        return false;
    }

    float d = DistSq ((Vec) obj);

    return IsConsideredZero (d);
}

public float DistSq(Vec p)
{
    Vec d = this - p;
    return d.LengthSq ();
}

public float LengthSq()
{
    return X * X + Y * Y + Z * Z;
}

private const float VEC_COMPARE_EPSILON_ABS = 1E-05f;
public static bool IsConsideredZero(float f)
{
    return Math.Abs (f) < VEC_COMPARE_EPSILON_ABS;
}

到目前为止,一切正常。 但是,现在我想获取向量的哈希码。 我可以看到像 hash = (int)X^(int)Y^(int)Z 这样的东西一定会失败。

我能想到的最好的办法是:

public override int GetHashCode()
{
    return 0;
}

这当然有点糟糕。 有没有办法获得合理的哈希码? NaN 和其他特殊值是可能的,但不太可能,以防万一这很重要。

It is well known that comparing floats by == is usually a mistake. In a 3D-vector class (with float components X, Y, Z) i wrote, two vectors are considered equal if their distance is considered zero.

public override bool Equals(object obj)
{
    if (obj == null) {
        return false;
    }

    if (GetType () != obj.GetType ()) {
        return false;
    }

    float d = DistSq ((Vec) obj);

    return IsConsideredZero (d);
}

public float DistSq(Vec p)
{
    Vec d = this - p;
    return d.LengthSq ();
}

public float LengthSq()
{
    return X * X + Y * Y + Z * Z;
}

private const float VEC_COMPARE_EPSILON_ABS = 1E-05f;
public static bool IsConsideredZero(float f)
{
    return Math.Abs (f) < VEC_COMPARE_EPSILON_ABS;
}

So far, everything worked fine. However, now i'd like to get a hashcode of the vector. I can see that something like hash = (int)X^(int)Y^(int)Z is bound to fail.

The best i could come up with was:

public override int GetHashCode()
{
    return 0;
}

This, of course, kind of sucks. Is there any way to get a reasonable hashcode? NaNs and other special values are possible, but unlikely, in case that is important.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

¢好甜 2024-07-21 06:13:53

假设您想要具有正常的哈希码/相等属性是不可能的:

  • 如果 X = Y 且 Y = Z 则 X = Z(传递性)
  • 如果 X = Y 则 Y = X(交换性)
  • 对于所有 X,X = X(自反性

)第一条规则是问题所在 - 因为如果每个值都被视为“等于”下一个更大的可表示数字,那么最终的结果是所有数字都相等。 例如,假设一个数字被认为等于另一个数字,它们在 0.1 之内:

0 等于 0.08
0.08 等于 0.16
0.16 等于 0.24

=> 根据传递性规则 0 等于 0.16
=> 根据传递性规则,0 等于 0.24

(等等)

如果您忽略传递性规则,那么您仍然(大概)希望“相等”值具有相等的哈希码。 这有效地强制执行了传递性规则 - 在上面的示例中,0 和 0.08 必须具有相同的哈希码,0 和 0.16 也是如此。 因此 0 和 0.16 必须具有相同的哈希码,依此类推。 因此,你不能有有用的哈希码 - 它必须是一个常量。

It's impossible assuming you want to have the normal hashcode/equality properties:

  • If X = Y and Y = Z then X = Z (transitivity)
  • If X = Y then Y = X (commutivity)
  • X = X for all X (reflexivity)

The first rule is the problem - because if each value is deemed "equal" to the next greater representable number, you end up with all numbers being equal. For instance, suppose a number is deemed equal to another they're within 0.1:

0 equals 0.08
0.08 equals 0.16
0.16 equals 0.24

=> 0 equals 0.16 by the transitivity rule
=> 0 equals 0.24 by the transitivity rule

(etc)

If you ignore the transitivity rule, then you still (presumably) want "equal" values to have equal hashcodes. This effectively enforces the transitivity rule - in the above example, 0 and 0.08 have to have equal hashcodes, as do 0 and 0.16. Therefore 0 and 0.16 have to have equal hashcodes, and so on. Therefore you can have no useful hashcode - it has to be a constant.

紫瑟鸿黎 2024-07-21 06:13:53

我认为您不能拥有与您的比较方法一致的哈希码,因为后者不可传递:对于任何三个向量 A、B、C,如果 A.Equals(B) 和 < code>B.Equals(C) 为 true,但 A.Equals(C) 仍可能为 false。 (想象一下,如果 A 和 B 之间的距离是 6e-6,B 和 C 之间的距离是 6e-6,A 和 C 之间的距离是 1.2e-5)但是哈希码的相等性始终是可传递的,因为它们只是数字。

在这种情况下,我只是创建一个 hashcode 方法,根据浮点坐标的精确值计算哈希值,并在文档中提到它与 equals 不一致。 我知道这并不是一个真正的解决方案,但考虑到我认为不存在真正的解决方案,拥有一个不平凡的哈希码比只有 0 更好。

I don't think you can have a hashcode that is consistent with your comparison method because the latter is not transitive: for any three vectors A, B, C, if A.Equals(B) and B.Equals(C) are true, it could still be the case that A.Equals(C) is false. (Imagine if the distance between A and B is 6e-6, between B and C is 6e-6, and between A and C is 1.2e-5) But equality of hashcodes is always transitive, since they're just numbers.

In this case, I'd just create a hashcode method that computes the hash based on the exact values of the floating-point coordinates, and mention in the documentation that it's inconsistent with equals. I know it's not really a solution but given that I don't think a real solution exists, it's better to have a nontrivial hashcode than just 0.

萌无敌 2024-07-21 06:13:53

恐怕不是一般情况。 证明的草图如下:

取任意两个数 a 和 b。 设它们之间的差为d。 然后,如果您创建 d/epsilon 数字,中间有一个 epsilon 步骤,则每个步骤必须“等于”之前的步骤,根据哈希码语义,其具有相同的哈希码。 因此所有数字必须具有相同的哈希码。

只有添加一些其他约束才能解决这个问题。

顺便说一句,您对 Equals 的定义也是错误的,因为 a.Equals(b) 和 b.Equals(c) 可能是正确的,但 a.Equals(c) 不是,这对于 equals 来说是错误的。 这称为破坏传递性属性。

那我能做什么呢?

解决方案取决于您使用哈希的目的。 一种解决方案是引入概念网格。 更改 equals 和 hashcode,以便在同一网格立方体中的两个数字相等,方法是四舍五入到恒定的小数位数,然后对四舍五入的数字取 equals 和 hashcode。 如果接近零是一个重要情况,请在舍入之前添加 epsilon/2 的偏移量,这样零就是立方体的中心。 这是正确的,但是两个数字可以任意接近(在浮点数的限制下)而不相等。 因此,对于某些应用程序来说可以,但对于其他应用程序则不行。 这类似于 mghie 的想法。

I'm afraid it is not in the general case. A sketch of a proof goes like this:

Take any two numbers a and b. Let the difference between them be d. Then if you create the d/epsilon numbers with an epsilon step in between, each step must be "equal" to the step before, which by hashcode semantics have the same hashcode. So all numbers must have the same hashcode.

You can only solve this problem if you add some other constraint.

As an aside, you definition of Equals is wrong as well, as it can be true that a.Equals(b) and b.Equals(c) but not a.Equals(c), which is wrong for equals. This is known as breaking the Transitivity property.

What can I do then?

The solution depends on what you are using the hash for. One solution would be to introduce a conceptual grid. Change the equals and hashcode so two numbers are equal if in the same grid cube, by rounding to a constant number of decimal places, then taking equals and hashcode on the rounded number. If being close to zero is an important case, add a offset of epsilon/2 before rounding, so zero is the centre of the cube. This is correct, but you can have two numbers arbitrarily close together (under the limits of float) without being equal. So for some applications it will be ok, others it won't be. This is similar to an idea from mghie.

淡写薰衣草的香 2024-07-21 06:13:53

每个人都是正确的......

但是,经常做的一件事是稍微扩展哈希的概念。 考虑用带有侧面 >> 的盒子对 3D 空间进行分区。 厄普西隆。

点的哈希值是它所属的框。
当您想要查找一个点时,您不会检查该点与相应的框(就像您对常规散列所做的那样),而是也会检查相邻的框。 在 3d 中,您应该最多使用 8 个盒子。

Everybody is correct ...

HOWEVER, one thing that is often done is to extend the concept of hash a bit. Consider a partition of your 3d space with boxes with a side >> epsilon.

The hash of a point is the box it belongs to.
When you want to lookup for a point, you don't check for the point with the corresponding box (as you would do for a regular hash) but for the neighboring boxes as well. In 3d you should get away with max 8 boxes.

女皇必胜 2024-07-21 06:13:53

无论你使用什么技术都会有问题,因为你提出了一些无法解决的问题。

你想要的是1)均匀分布的哈希,这样对于大多数数字a和b,其中a!= b然后a.GetHashCode()!= b.GetHashCode()但是2)其中a == b然后a.GetHashCode()= = b.GetHashCode() 必须为 true。

返回常量满足 (2) 但不满足 (1)。

您可以证明在 1E-5 边界进行舍入并将其用作散列违反了满足 (1) 但违反了 (2)。 以1E-5和2E-5为例。 舍入会产生两个不同的哈希值,但它们比较相等。 这违反了上面的约束(2)。 您可以轻松地概括这一点,以证明数字的任何舍入都会遇到类似的问题。

我建议您选择不同的方法。 我认为根本问题是确定某个点是否接近您已有的点。 我建议将坐标空间递归地分成两半(其中沿边界的点(即距边界<=1E-5)分为两半)。 如果您逐步划分空间(想想二叉树),您可以构造一个数据结构,该结构将快速返回您想要的结果并且相当容易构造。

如果我错过了我的猜测,并且您必须使用哈希,那么可以使用两个哈希值执行您想要的操作,每个哈希值四舍五入到 1E-5 但偏移 5E-6。 所有相等的点在两个哈希值之一上比较相等。 这将要求您在哈希表中输入 point 两次,每个哈希例程一次。

Whatever technique you use will have problems because you posed something that isn't possible to solve.

What you want is 1) evenly distributed hash such that for most numbers a and b where a != b then a.GetHashCode() != b.GetHashCode() but 2) where a == b then a.GetHashCode() == b.GetHashCode() must be true.

Returning a constant fulfills (2) but not (1).

You can demonstrate that rounding at 1E-5 boundaries and using that as a hash violates fulfills (1) but violates (2). Take 1E-5 and 2E-5, for example. Rounding would produce two different hash values but they compare equal. This violates constraint (2) above. You can easily generalize this to prove that any rounding of the number will run into a similar problem.

I recommend you choose a different approach. I assume the underlying problem is determining if some point is close to a point you already have. I recommend recusively dividing the coordinate space in half (where points along the boundary (i.e. <=1E-5 from a boundary) in both halves). If you progressively divide you space (think binary tree) you can construct a data structure that will quickly return the result you want and be fairly easy to construct.

If I missed my guess and you must use a hash then can do what you want with two hash values each rounding to 1E-5 but offset by 5E-6. All equal points will compare equal on one of the two hash values. This would require you to enter point in the hash table twice, once for each hash routine.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文