完美的哈希函数

发布于 2024-10-01 11:00:10 字数 238 浏览 10 评论 0原文

我正在尝试对值进行哈希处理，

10, 100, 32, 45, 58, 126, 3, 29, 200, 400, 0

我需要一个函数将它们映射到大小为 13 的数组，而不会引起任何冲突。

我花了几个小时思考这个问题并进行谷歌搜索，但无法弄清楚。我还没有找到可行的解决方案。

我将如何找到这种哈希函数？我玩过 gperf，但我不太理解它，也无法得到我想要的结果。

原文

I'm attempting to hash the values

10, 100, 32, 45, 58, 126, 3, 29, 200, 400, 0

I need a function that will map them to an array that has a size of 13 without causing any collisions.

I've spent several hours thinking this over and googling and can't figure this out. I haven't come close to a viable solution.

How would I go about finding a hash function of this sort? I've played with gperf, but I don't really understand it and I couldn't get the results I was looking for.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浪荡不羁 2024-10-08 11:00:10

如果您知道确切的密钥，那么生成完美的哈希函数就很简单 -

int hash (int n) {
  switch (n) {
    case 10:   return 0;
    case 100:  return 1;
    case 32:   return 2;
    // ...
    default:   return -1;
  }
}

if you know the exact keys then it is trivial to produce a perfect hash function -

int hash (int n) {
  switch (n) {
    case 10:   return 0;
    case 100:  return 1;
    case 32:   return 2;
    // ...
    default:   return -1;
  }
}

回复收藏 0 原文

心凉怎暖 2024-10-08 11:00:10

找到一个

我尝试了一些东西，找到了一个半手动的：

(n ^ 28) % 13

半手动部分是以下 ruby 脚本，我用它来测试具有一系列参数的候选函数：

t = [10, 100, 32, 45, 58, 126, 3, 29, 200, 400, 0]
(1..200).each do |i|
  t2 = t.map { |e| (e ^ i) % 13 }
  puts i if t2.uniq.length == t.length
end

Found One

I tried a few things and found one semi-manually:

(n ^ 28) % 13

The semi-manual part was the following ruby script that I used to test candidate functions with a range of parameters:

t = [10, 100, 32, 45, 58, 126, 3, 29, 200, 400, 0]
(1..200).each do |i|
  t2 = t.map { |e| (e ^ i) % 13 }
  puts i if t2.uniq.length == t.length
end

回复收藏 0 原文

决绝 2024-10-08 11:00:10

在某些平台（例如嵌入式）上，模运算的成本很高，因此最好避免使用 % 13。但是低位的AND运算很便宜，并且相当于2的幂的模。

我尝试编写一个简单的程序（用 Python）来搜索 11 个数据点的完美哈希，使用简单的形式，例如 ((x << a) ^ (x << b)) & ; 0xF（其中 & 0xF 相当于 % 16，例如给出 0..15 范围内的结果）。我能够找到以下无冲突哈希，它给出了 0..15 范围内的索引（表示为 C 宏）：

#define HASH(x)    ((((x) << 2) ^ ((x) >> 2)) & 0xF)

这是我使用的 Python 程序：

data = [ 10, 100, 32, 45, 58, 126, 3, 29, 200, 400, 0 ]

def shift_right(value, shift_value):
    """Shift right that allows for negative values, which shift left
    (Python shift operator doesn't allow negative shift values)"""
    if shift_value == None:
        return 0
    if shift_value < 0:
        return value << (-shift_value)
    else:
        return value >> shift_value

def find_hash():
    def hashf(val, i, j = None, k = None):
        return (shift_right(val, i) ^ shift_right(val, j) ^ shift_right(val, k)) & 0xF

    for i in xrange(-7, 8):
        for j in xrange(i, 8):
            #for k in xrange(j, 8):
                #j = None
                k = None
                outputs = set()
                for val in data:
                    hash_val = hashf(val, i, j, k)
                    if hash_val >= 13:
                        pass
                        #break
                    if hash_val in outputs:
                        break
                    else:
                        outputs.add(hash_val)
                else:
                    print i, j, k, outputs

if __name__ == '__main__':
    find_hash()

On some platforms (e.g. embedded), modulo operation is expensive, so % 13 is better avoided. But AND operation of low-order bits is cheap, and equivalent to modulo of a power-of-2.

I tried writing a simple program (in Python) to search for a perfect hash of your 11 data points, using simple forms such as ((x << a) ^ (x << b)) & 0xF (where & 0xF is equivalent to % 16, giving a result in the range 0..15, for example). I was able to find the following collision-free hash which gives an index in the range 0..15 (expressed as a C macro):

#define HASH(x)    ((((x) << 2) ^ ((x) >> 2)) & 0xF)

Here is the Python program I used:

data = [ 10, 100, 32, 45, 58, 126, 3, 29, 200, 400, 0 ]

def shift_right(value, shift_value):
    """Shift right that allows for negative values, which shift left
    (Python shift operator doesn't allow negative shift values)"""
    if shift_value == None:
        return 0
    if shift_value < 0:
        return value << (-shift_value)
    else:
        return value >> shift_value

def find_hash():
    def hashf(val, i, j = None, k = None):
        return (shift_right(val, i) ^ shift_right(val, j) ^ shift_right(val, k)) & 0xF

    for i in xrange(-7, 8):
        for j in xrange(i, 8):
            #for k in xrange(j, 8):
                #j = None
                k = None
                outputs = set()
                for val in data:
                    hash_val = hashf(val, i, j, k)
                    if hash_val >= 13:
                        pass
                        #break
                    if hash_val in outputs:
                        break
                    else:
                        outputs.add(hash_val)
                else:
                    print i, j, k, outputs

if __name__ == '__main__':
    find_hash()

回复收藏 0 原文

过期情话 2024-10-08 11:00:10

Bob Jenkins 也有一个程序：http://burtleburtle.net/bob/hash/perfect。 html

除非你非常幸运，否则对于给定的数据集没有“好的”完美哈希函数。完美的哈希算法通常在键上使用简单的哈希函数（使用足够的位，因此不会发生冲突），然后使用表来完成它。

回复收藏 0 原文

可是我不能没有你 2024-10-08 11:00:10

只是一些准分析的胡言乱语：

在你的一组数字中，总共十一个，其中三个是奇数，八个是偶数。
查看最简单的散列形式 - %13 - 将为您提供以下散列值：
10 - 3,
100 - 9,
32 - 6,
45 - 6,
58 - 6,
126 - 9,
3 - 3,
29 - 3,
200 - 5,
400 - 10,
0 - 0

当然，由于碰撞次数过多，这是不可用的。需要更详细的东西。

为什么要说显而易见的事情呢？
考虑到数字太少，任何复杂的 - 或者更确切地说，“不太简单” - 算法可能会比 switch 语句或（我更喜欢）简单地搜索大小为 11 个位置的无符号短/长向量并使用比赛的索引。

为什么要使用矢量搜索？

您可以通过将最常出现的值放置在向量的开头来对其进行微调。
我认为目的是将散列索引插入到具有良好顺序编号的开关中。从这个角度来看，首先使用一个开关来查找索引，然后将其插入另一个开关似乎很浪费。也许您应该考虑根本不使用散列并直接进入最后的开关？
哈希的 switch 版本无法进行微调，并且由于值差异很大，将导致编译器生成二叉搜索树，这将导致大量比较和条件/其他跳转（尤其昂贵），这需要时间（我假设您已经转向散列（因为它的速度）并且需要空间。
如果您想另外加速向量搜索并且使用 x86 系统，您可以基于汇编器指令 repne scasw（短）/repne scasd（长）实现向量搜索，这会快得多。经过几条指令的设置时间后，您将找到一条指令中的第一个条目和十一条指令中的最后一个条目，然后是一些指令清理。这意味着最好情况下需要 5-10 条指令，最坏情况下需要 15-20 条指令。除了一两种情况外，这应该在所有情况下都击败基于交换机的散列。