计算一组点的哈希码的最佳方法是什么?

发布于 2024-08-02 10:12:19 字数 270 浏览 7 评论 0原文

我正在寻找计算一组二维点的哈希码的最佳方法(以便我可以将多边形存储在哈希表中)。

有一些明显的方法可以做到这一点,例如连接字符串中的所有点坐标及其哈希码,但这会非常慢。

在速度/碰撞谱的另一端,我还可以总结所有坐标,这将导致非常快的代码,但也会产生大量碰撞。

计算一组点的哈希码的最佳方法是什么

如果坐标是整数(与实际坐标),最佳解决方案是否不同?

编辑:我正在使用 .net,因此哈希码应该是 32 位长。

I'm looking for the optimal way to compute a hashcode for a set of bi-dimensional points (so that I can store polygons in a hashtable).

There are some obvious ways to do that, such as concatenating all the points coordinates in a string and its hashcode, but this would be very slow.

On the other end of the speed/collision spectrum, I can also for example sum up all the coordinates, which would result in a very fast code, but would also create a lot of collisions.

What's the optimal way to compute a hashcode for a set of points?

Is the optimal solution different if the coordinates are integer (vs real coordinates)?

Edit : I'm using .net so the hashcode should be 32 bits long.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

痕至 2024-08-09 10:12:19

这项工作没有最佳方法。这完全取决于您能承受多大的哈希值。你必须在速度和扩散之间进行权衡。请记住,不存在最佳解决方案(如果您不完全知道要散列什么),在某些情况下,异或就足够了。

以这段代码为例,

unsigned int JSHash(char* str, unsigned int len)
{
    unsigned int hash = 1315423911;
    unsigned int i    = 0;

    for(i = 0; i < len; str++, i++)
    {
        hash ^= ((hash << 5) + (*str) + (hash >> 2));
    }

    return hash;
}
/* End Of JS Hash Function */

您说将点聚合在一起会很慢。如果您修复上面的代码,它不需要任何类型的聚合,只需通过 trought (总和没有太大不同)并且如果您使用整数和浮点数,您可能会修复移位(<< 和 >> 是移位操作,其中一起工作就像按位旋转)以适合您的数据类型。

在这里检查其他哈希函数:
http://www.partow.net/programming/hashfunctions/

There is no optimal way for this job. It all depends on how big hash can you afford. You have to make tradoffs between speed and diffusion. Keep in mind that there is no such thing as optimal solution (if you do not exactly know what you are going to hash) In some cases xor can be good enough.

Take for instance this code

unsigned int JSHash(char* str, unsigned int len)
{
    unsigned int hash = 1315423911;
    unsigned int i    = 0;

    for(i = 0; i < len; str++, i++)
    {
        hash ^= ((hash << 5) + (*str) + (hash >> 2));
    }

    return hash;
}
/* End Of JS Hash Function */

You said that agregating points together is to slow. If you fix upper code it does not need any kind of agregation just pass trought (not much different that sums) And if you are using integeres and floats you would probably fix shifts (<< and >> are shift operations which together works like bitwise rotation) to fit your data type.

Check for other hash functions here:
http://www.partow.net/programming/hashfunctions/

流殇 2024-08-09 10:12:19

最佳取决于您对哈希计算的要求。

性能将以更多哈希冲突为代价。

你对其中任何一个都有严格的限制吗?这将归结为对每个百分比的哈希冲突会给您带来多少性能损失的数学分析。

Optimal is dependent on your requirements from the hash computation.

Performance will come at the cost of more hash collisions.

Do you have a hard bound on either one? It's going to come down to a mathematical analysis of how much each percent of hash collisions is going to cost you in terms of performance.

幻想少年梦 2024-08-09 10:12:19

如果您的数据集万一是可以具有公共边但不重叠的多边形之一,则只需对每个多边形中的三个点进行散列即可避免冲突。

编辑:重新考虑这一点,想象一下与凹/凸边界可能发生的碰撞,多边形重叠也是如此。 - 叹息

唉:当凸面和凹面相遇时,总是给我带来麻烦。 :-P

If your data set is by any chance one of polygons that can have common edges but not overlap otherwise, you only need to hash on three points in each polygon to avoid collisions.

Edit: Reconsidering this, picturing possible collisions with concave/convex boundaries, it is just as well your polygons overlap. - Sigh

Alas: When the convex and the concave meet, it always gets me into trouble. :-P

时光是把杀猪刀 2024-08-09 10:12:19

或者,您可以对各个点的哈希值进行异或。

return p1.GetHashCode() ^ p2.GetHashCode()

无论如何,取决于价值观。也许可以添加它们。

Alternatively, you can just XOR the hashes of the individual points.

return p1.GetHashCode() ^ p2.GetHashCode()

Depending on what the values are going to be anyway. Probably could just add them.

鹤仙姿 2024-08-09 10:12:19

如果您希望顺时针和逆时针定义但其他方面相等的多边形相等,那么您必须创建一个规范化函数。从任意点开始以任意顺序给出多边形点的函数将按相同顺序返回这些点。

我能想到的一种算法是找到所有可能的点序列中的最小值:

  1. 找到最左上角点的集合(具有最小 y 的点中具有最小 x 的点),这些是起点。
  2. 对于每个起点和每个方向,迭代地添加给定方向上的连接点,并消除当前迭代中所有非左上角的点。
    当只剩下一个起点、方向对或完成 n-1 次迭代时停止。如果剩余多个起点和方向,请选择任意一个 - 它们都是同构的。
  3. 从找到的点开始沿找到的方向对点重新排序。

对于完全退化的多边形来说,这是 O(n^2) 最坏情况,但如果您的多边形没有重叠点,则这是 O(n),并且常数因子非常小。

通过规范化顺序,您可以轻松比较两个多边形是否相等,只需迭代比较点是否相等。哈希码计算也很简单,使用任何合理稳健的哈希组合方法即可。例如:

int result = 0;
foreach (var point in this.points) {
    result = (result * 31 + point.X.GetHashCode()) * 31 + point.Y.GetHashCode();
}

If you want polygons that are defined clockwise and anticlockwise, but otherwise equal, to be equal, then you'll have to create a canonicalization function. A function that given a polygons points starting from any point and in any order will return the points in equal order.

One algorithm that I can think of is to find the minimum of all possible sequences of points:

  1. Find the set of top-leftmost points (points with minimum x of the points with minimum y), these are the starting points.
  2. For each starting point and each direction, iteratively add connected points in the given direction and eliminate all that aren't top-leftmost in the current iteration.
    Halt when only one starting point,direction pair is left or when n-1 iterations are completed. If more than one starting point and direction is remaining, choose any - they are all isomorphic.
  3. Reorder the points starting from the found point in the found direction.

This is O(n^2) worst-case for fully degenerate polygons, but if your polygons don't have overlapping points, this is O(n), with a pretty small constant factor.

With the canonicalized order you can easily compare two polygons for equality, just iteratively compare points for equality. Hashcode calculation is also trivial, use any reasonably robust hash combination method. For example:

int result = 0;
foreach (var point in this.points) {
    result = (result * 31 + point.X.GetHashCode()) * 31 + point.Y.GetHashCode();
}
栀子花开つ 2024-08-09 10:12:19

对于在顺时针/逆时针独立性上具有所需属性的非常快速(计算)的哈希,您不希望依赖于找到明确定义的点排序。

这将您的哈希组合操作限制为可通勤的操作。因此,我们希望在组合操作期间将与方向无关的任何和所有数据分开。

这是一个简单的解决方案:

假设有一个组合函数 int ->整数-> int 是结合律
首先可以执行以下任何操作:

public static int combine(int h, int x)
{
    return h * 31 + x;
} 

public static int combine(int h, int x)
{
    return h ^ x;
} 

然后我们可以执行以下操作:

public override int GetHashCode()
{
    int x = 0;
    int y = 0;
    uint h = 0;    
    foreach (var point p in polgon)
    {
        x = combine(x, p.X);
        y = combine(y, p.Y);
        h++;
    }
    // simplified, unrolled Murmur2 hash for end stage
    const uint m = 0x5bd1e995;
    const int r = 24;
    uint h = count;
    uint k = ReinterpretInt32ToUInt32(x);
    k *= m;
    k ^= k >> r;
    k *= m;
    h *= m;
    h ^= k;
    k = ReinterpretInt32ToUInt32(y);
    k *= m;
    k ^= k >> r;
    k *= m;
    h *= m;
    h ^= k;
    // avalanche
    h ^= h >> 13;
    h *= m;
    h ^= h >> 15;
    return ReinterpretUInt32ToInt32(h);
}

依靠此使上面的代码变得简单

public unsafe uint ReinterpretInt32ToUInt32(int i)
{
    return *((uint*) (void*) &i);
}

public unsafe int ReinterpretUInt32ToInt32(uint u)
{
    return *((int*) (void*) &u);
}

这不会是避免碰撞方面的最佳散列,但计算起来应该非常快,您可能会发现它足以满足您的需求。

For a very quick (to calculate) hash with the desired properties on clockwise/counter clockwise independence you would not want to be dependent on finding a well defined ordering of the points.

This limits your hash combining operations to ones which commute. Therefore we wish to keep any and all data which is independent of orientation separate during the combining operations.

Here is a simple solution:

Assuming a combine function int -> int -> int which is associative
any of the following will do to start with:

public static int combine(int h, int x)
{
    return h * 31 + x;
} 

public static int combine(int h, int x)
{
    return h ^ x;
} 

Then we can do the following:

public override int GetHashCode()
{
    int x = 0;
    int y = 0;
    uint h = 0;    
    foreach (var point p in polgon)
    {
        x = combine(x, p.X);
        y = combine(y, p.Y);
        h++;
    }
    // simplified, unrolled Murmur2 hash for end stage
    const uint m = 0x5bd1e995;
    const int r = 24;
    uint h = count;
    uint k = ReinterpretInt32ToUInt32(x);
    k *= m;
    k ^= k >> r;
    k *= m;
    h *= m;
    h ^= k;
    k = ReinterpretInt32ToUInt32(y);
    k *= m;
    k ^= k >> r;
    k *= m;
    h *= m;
    h ^= k;
    // avalanche
    h ^= h >> 13;
    h *= m;
    h ^= h >> 15;
    return ReinterpretUInt32ToInt32(h);
}

Relying on this to make the code above easy

public unsafe uint ReinterpretInt32ToUInt32(int i)
{
    return *((uint*) (void*) &i);
}

public unsafe int ReinterpretUInt32ToInt32(uint u)
{
    return *((int*) (void*) &u);
}

This will not be the best hash in terms of collision avoidance but should be very fast to calculate and you may find it sufficient for your needs.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文