为近似相似的数字生成相同的哈希码

发布于 2024-08-21 10:48:02 字数 1226 浏览 6 评论 0原文

我正在 C# 3.5 中创建一个应用程序，该应用程序使用 AutoCAD API 读取 2D AutoCAD 绘图，使用定义的业务逻辑对绘图进行更改，然后在 AutoCAD 中将其调整回来。由于逻辑的本质，绘图的形状必须重新构建 - 例如，矩形由 4 条连接直线组成。

我使用 AutoCAD 中每条线的起始和结束坐标创建这些形状，但某些坐标并不完全匹配。例如，一个点可能位于 0.69912839（在一个轴上），但从同一点开始的一条线可能是 0.69990821。这些以毫米为单位，因此距离是分钟（0.00078毫米！）

我创建了自己的类（称为 MyPoint，类似于 PointF），因为我需要向其添加一些额外的逻辑。在该类中，我创建了一个方法，该方法接受两个双精度值并返回 true 或 false，具体取决于两个点彼此的距离是否在 0.001mm 以内。然后我重写了 Equals 方法、 == 和 != 运算符，这样我就可以执行 (point1 == point2 或 point1.Equals(point2)) 来检查所有轴是否彼此在 0.001mm 以内 - 如果是，我将其归类为同一点。

这很好，而且工作得很好。现在，我需要检查这些点类的集合以消除所有重复项，因此我在集合上使用 LINQ 的 Distinct() 方法。但是，此方法使用 GetHashcode() 而不是 Equals() 来确定实例是否相等。因此，我重写了 GetHashcode()，它使用 double 类的 GetHashcode。

但是，上面的示例失败了，因为显然它们是不同的值，因此生成不同的哈希码。有没有办法让两个相差在 0.001 以内的数字可以生成相同的哈希码？（请注意，这些数字彼此并不了解，因为 GetHashcode 是在不同的类实例上单独调用的。）我尝试了多种方法，这些方法适用于某些示例，但不适用于其他示例。

一个例子是将数字截断为 3dp（乘以 10^3，然后截断它）并在结果上创建哈希码 - 这适用于上面的示例 (699 == 699。) 但这不适用于 0.69990821 和0.70000120（699！= 700。）我尝试过四舍五入，这适用于第二组数字（0.700 == 0.700），但不适用于第一组数字（0.699！= 0.700。）我什至尝试将数字截断为3dp然后将其调整到下一个偶数，这适用于前面的两个示例，但不适用于 12.9809 和 12.9818 (12980 != 12982。)

还有其他方法吗，或者我应该废弃 Equals、==、!= 和 GetHashcode覆盖，并创建我自己的 MyPoint.IsEqualTo() 和 MyPointCollection.Distinct() 方法？

原文

I'm creating an application in C# 3.5 that uses the AutoCAD API to read a 2D AutoCAD drawing, make changes to the drawing using defined business logic, then adjust it back in AutoCAD. Due to the nature of the logic, the shape of the drawing has to be re-constructed - e.g. a rectangle is made up of 4 connecting straight lines.

I'm creating these shapes using the start and end co-ordinates of each line from AutoCAD, but some of the co-ordinates don't exactly match up. For example, one point could at 0.69912839 (on one axis), but a line starting from the same point could be 0.69990821. These are in mm so the distance is minute (0.00078mm!)

I've created my own class (call it MyPoint, similar to PointF) because I've needed to add some additional logic to it. In that class I've created a method that takes two doubles and returns true or false depending on if the two points are within 0.001mm of each other. I've then overridden the Equals method, == and != operators so I can do (point1 == point2 or point1.Equals(point2)) which checks if all axis are within 0.001mm of each other - if they are, I class it as being the same point.

That's fine and working brilliantly. Now, I need to check a collection of these point classes to get rid of all duplicates, so I'm using LINQ's Distinct() method on my collection. However this method uses GetHashcode(), not Equals() to determine if the instances are equal. So, I've overriden GetHashcode() which uses the GetHashcode of the double class.

But, the above example fails because obviously they're different values and therefore generate different hashcodes. Is there any way that two numbers that are within 0.001 of each other can generate the same hashcode? (Note the numbers don't know about each other as GetHashcode is called separately on different class instances.) I've tried numerous ways which work for some examples but not for others.

One example is truncating the number to 3dp (multiply it by 10^3, then truncate it) and creating the hashcode on the result - which works for the above example (699 == 699.) But this doesn't work for 0.69990821 and 0.70000120 (699 != 700.) I've tried rounding, which works for the second set of numbers (0.700 == 0.700) but not for the first (0.699 != 0.700.) I've even tried truncating the number to 3dp then adjusting it up to the next even number, which works for both the previous examples, but not for 12.9809 and 12.9818 (12980 != 12982.)

Is there another way, or should I scrap the Equals, ==, != and GetHashcode overrides, and create my own MyPoint.IsEqualTo() and MyPointCollection.Distinct() methods?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风情万种。 2024-08-28 10:48:02

它无法写入正确的哈希码。让我们证明一下：
我们有2分。
var a = point1.GetHashCode();
var b = point2.GetHashCode();

if a!= b 让我们在 point1 和 point2 之间创建点。等等。

经过这样的操作，我们将创建一条线，其中每个点都靠近其他点，并且它们的哈希码将相同。
因此 point1 和 point2 的哈希码应该相等。

所以像这样过度：

public override int GetHashCode()
{
    return 0;
}

并实现你们平等。

It's unable to write correct hashcode. let proof it:
we have 2 points.
var a = point1.GetHashCode();
var b = point2.GetHashCode();

if a!= b let create point between point1 and point2. and so on.

After such operations we will create line, where each point is near to some other point and their hashcodes will be the same.
So hash code for point1 and point2 should be equals.

So overrtide like this:

public override int GetHashCode()
{
    return 0;
}

and implement you equals.

回复收藏 0 原文

人疚 2024-08-28 10:48:02

我认为你不应该重写 Equals()、==、!= 或 GetHashCode()

如果你重写对于其中任何一个，您都应该确保它们的语义不会改变。在你的例子中，他们这样做了。

例如，您的 == 不能传递，如果 P1 距 P2 0.001 毫米，P2 距 P3 0.001 毫米，P1 距 P3 0.002 毫米，则 P1 == P2，P2 == P3 且 P1 == P3，这不是你想要的。一般来说，你最终得到的所有点都等于所有其他点。

我会坚持使用单独的方法来确定点是否足够接近。

编辑

通过覆盖==，您现在可以编写如下代码：

if(P1 == P2 && P2 == P3 && P1 != P3)
{
    // Code here gets executed
}

I think you should not override Equals(), ==, != or GetHashCode()

If you override any of these you should make sure that their semantics do not change. In your example they do.

For instance your == cannot be transitive for it it is then if P1 is 0.001 mm from P2, P2 is 0.001 mm from P3 and P1 is 0.002 mm from P3 then P1 == P2, P2 == P3 and P1 == P3, which is not what you want. In general you end up with all points being equal to all other points.

I would just stick to using a separate method for determining if points are close enough.

EDIT

With your override of == you can now write code like this:

if(P1 == P2 && P2 == P3 && P1 != P3)
{
    // Code here gets executed
}

回复收藏 0 原文

你是年少的欢喜 2024-08-28 10:48:02

删除对 Distinct 方法的依赖会更容易。实现 System.Collections.IComparer （或通用等效项）并使用简单的集合（例如列表）。然后使用比较器确定该项目是否在列表中，如果已包含则不添加它。

回复收藏 0 原文

画尸师 2024-08-28 10:48:02

我猜想，如果您始终返回相同的散列（例如 0），LinQ 将尝试将所有元素与 equals 进行比较。毕竟，哈希对于证明两个元素不同而不是相等很有用。

但无论如何，我建议您使用更适合该领域的结构和算法，例如二元分割分区 (BSP) 树。

回复收藏 0 原文

能怎样 2024-08-28 10:48:02

这应该是对 Steck 和 Paolo 所说内容的更清晰的解释。

假设您确实按照您想要的方式编写了 GetHashCode 方法。

然后，对于任何点 a 和 b，无论它们之间的距离如何，a.GetHashCode() == b.GetHashCode()。

证明：假设a < b。将 a 和 b 之间的距离分割成小于 0.001 的段。即a0 = a、a1 = a0 + 0.0005、a2 = a1 + 0.0005等，直到到达b代码>.

然后a.GetHashCode() == a1.GetHashCode() == a2.GetHashCode() == ... == b.GetHashCode()。

回复收藏 0 原文

高冷爸爸 2024-08-28 10:48:02

我应该废弃 Equals、==、!= 和 GetHashcode 重写，并创建自己的 MyPoint.IsEqualTo() 和 MyPointCollection.Distinct() 方法吗？

是的。

但它不一定是某种完全不同的数据结构。在检查重复项时，您需要检查相邻的哈希码，例如 (x+0.001,y)、(x,y-0.001) 等的哈希值。与通常的重复数据删除查找相比，这只会导致常数因子减慢，而且并不复杂，因此这可能是可行的方法。（这是一个显而易见的观点，但我还没有看到它在这里明确提出。）

更新：为了澄清这一点，让我们看一下问题的一维版本。 “点”是单个数字 x。当abs(x1 - x2) abs(x1 - x2) 时，我们认为x1和x2匹配。 .001。现在我们想查明 x 是否与 {x_0, ..., x_n} 中的任何一个匹配。 x_i 保存在哈希表中，其中 hash(x) = h(floor(1000*x)) 代表一些传播事物的函数 h()。要查看 x 是否已在表中，我们计算 hash(x-.001)、hash(x) 和 hash(x+.001)< /code>，然后我们测试 x 是否与三个存储桶中任何一个中的任何 x_i 匹配。任何匹配的 x_i 不能位于任何其他存储桶中。

在 2-d 变体中，有 9 个相邻的桶需要检查（算上中间的）；在 3-d 中，27。

回复收藏 0 原文

萌能量女王 2024-08-28 10:48:02

这是一些代码来展示我在做什么。 “原始”中的每对数字应返回相同的值。

int tolerance = 3;
double[] original = new double[] {
0.69912839,
0.69990821,

0.69990821,
0.70000120,

12.980984087,
12.981808908
};
double[] modified = new double[original.Length];

for (int i = 0; i < original.Length; i++)
{
modified[i] = original[i];

/* Begin number adjustment logic */
modified[i] *= Math.Pow(10, tolerance);
modified[i] = Math.Truncate(modified[i]);

if (modified[i] % 2 != 0)
{
modified[i]++;
}
/* End number adjustment logic */

Console.WriteLine(modified[i]);

if (i % 2 != 0)
{
Console.WriteLine(string.Empty);
}
}

上面的方法是“截断到 3dp 然后调整到最接近的偶数”方法。接下来是 truncate 方法（替换开始/结束注释之间的代码）：

/* Begin number adjustment logic */
modified[i] *= Math.Pow(10, tolerance);
modified[i] = Math.Truncate(modified[i]);
/* End number adjustment logic */

这是 round 方法：

/* Begin number adjustment logic */
modified[i] = Math.Round(modified[i], tolerance);
/* End number adjustment logic */

Here's some code to show what I'm doing. Each pair of numbers in "original" should return the same value.

int tolerance = 3;
double[] original = new double[] {
0.69912839,
0.69990821,

0.69990821,
0.70000120,

12.980984087,
12.981808908
};
double[] modified = new double[original.Length];

for (int i = 0; i < original.Length; i++)
{
modified[i] = original[i];

/* Begin number adjustment logic */
modified[i] *= Math.Pow(10, tolerance);
modified[i] = Math.Truncate(modified[i]);

if (modified[i] % 2 != 0)
{
modified[i]++;
}
/* End number adjustment logic */

Console.WriteLine(modified[i]);

if (i % 2 != 0)
{
Console.WriteLine(string.Empty);
}
}

That above method is the "truncate to 3dp then adjust to the nearest even" method. Next is the truncate method (replace code between the begin/end comments):

/* Begin number adjustment logic */
modified[i] *= Math.Pow(10, tolerance);
modified[i] = Math.Truncate(modified[i]);
/* End number adjustment logic */

This is the round method:

/* Begin number adjustment logic */
modified[i] = Math.Round(modified[i], tolerance);
/* End number adjustment logic */

回复收藏 0 原文

~没有更多了~

关于作者

帝王念

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

为近似相似的数字生成相同的哈希码

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

关于作者

相关话题

热门标签

推荐作者

烙印

singlesman

给自己一个微笑

独孤求败

晨钟暮鼓

我是自愿种绣球花的

友情链接

为近似相似的数字生成相同的哈希码

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

关于作者

相关话题

热门标签

推荐作者

烙印

singlesman

给自己一个微笑

独孤求败

晨钟暮鼓

我是自愿种绣球花的

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。