完美的哈希函数
我正在尝试对值进行哈希处理,
10, 100, 32, 45, 58, 126, 3, 29, 200, 400, 0
我需要一个函数将它们映射到大小为 13 的数组,而不会引起任何冲突。
我花了几个小时思考这个问题并进行谷歌搜索,但无法弄清楚。我还没有找到可行的解决方案。
我将如何找到这种哈希函数?我玩过 gperf,但我不太理解它,也无法得到我想要的结果。
I'm attempting to hash the values
10, 100, 32, 45, 58, 126, 3, 29, 200, 400, 0
I need a function that will map them to an array that has a size of 13 without causing any collisions.
I've spent several hours thinking this over and googling and can't figure this out. I haven't come close to a viable solution.
How would I go about finding a hash function of this sort? I've played with gperf, but I don't really understand it and I couldn't get the results I was looking for.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
如果您知道确切的密钥,那么生成完美的哈希函数就很简单 -
if you know the exact keys then it is trivial to produce a perfect hash function -
找到一个
我尝试了一些东西,找到了一个半手动的:
半手动部分是以下 ruby 脚本,我用它来测试具有一系列参数的候选函数:
Found One
I tried a few things and found one semi-manually:
The semi-manual part was the following ruby script that I used to test candidate functions with a range of parameters:
在某些平台(例如嵌入式)上,模运算的成本很高,因此最好避免使用
% 13
。但是低位的AND
运算很便宜,并且相当于2的幂的模。我尝试编写一个简单的程序(用 Python)来搜索 11 个数据点的完美哈希,使用简单的形式,例如
((x << a) ^ (x << b)) & ; 0xF
(其中& 0xF
相当于% 16
,例如给出 0..15 范围内的结果)。我能够找到以下无冲突哈希,它给出了 0..15 范围内的索引(表示为 C 宏):这是我使用的 Python 程序:
On some platforms (e.g. embedded), modulo operation is expensive, so
% 13
is better avoided. ButAND
operation of low-order bits is cheap, and equivalent to modulo of a power-of-2.I tried writing a simple program (in Python) to search for a perfect hash of your 11 data points, using simple forms such as
((x << a) ^ (x << b)) & 0xF
(where& 0xF
is equivalent to% 16
, giving a result in the range 0..15, for example). I was able to find the following collision-free hash which gives an index in the range 0..15 (expressed as a C macro):Here is the Python program I used:
Bob Jenkins 也有一个程序:http://burtleburtle.net/bob/hash/perfect。 html
除非你非常幸运,否则对于给定的数据集没有“好的”完美哈希函数。完美的哈希算法通常在键上使用简单的哈希函数(使用足够的位,因此不会发生冲突),然后使用表来完成它。
Bob Jenkins has a program for this too: http://burtleburtle.net/bob/hash/perfect.html
Unless you're very lucky, there's no "nice" perfect hash function for a given dataset. Perfect hashing algorithms usually use a simple hashing function on the keys (using enough bits so it's collision-free) then use a table to finish it off.
只是一些准分析的胡言乱语:
在你的一组数字中,总共十一个,其中三个是奇数,八个是偶数。
查看最简单的散列形式 - %13 - 将为您提供以下散列值:
10 - 3,
100 - 9,
32 - 6,
45 - 6,
58 - 6,
126 - 9,
3 - 3,
29 - 3,
200 - 5,
400 - 10,
0 - 0
当然,由于碰撞次数过多,这是不可用的。需要更详细的东西。
为什么要说显而易见的事情呢?
考虑到数字太少,任何复杂的 - 或者更确切地说,“不太简单” - 算法可能会比 switch 语句或(我更喜欢)简单地搜索大小为 11 个位置的无符号短/长向量并使用比赛的索引。
为什么要使用矢量搜索?
Just some quasi-analytical ramblings:
In your set of numbers, eleven in all, three are odd and eight are even.
Looking at the simplest forms of hashing - %13 - will give you the following hash values:
10 - 3,
100 - 9,
32 - 6,
45 - 6,
58 - 6,
126 - 9,
3 - 3,
29 - 3,
200 - 5,
400 - 10,
0 - 0
Which, of course, is unusable due to the number of collisions. Something more elaborate is needed.
Why state the obvious?
Considering that the numbers are so few any elaborate - or rather, "less simple" - algorithm will likely be slower than either the switch statement or (which I prefer) simply searching through an unsigned short/long vector of size eleven positions and using the index of the match.
Why use a vector search?
当我在 Mathematica 中尝试时,我进行了快速检查并使用 SHA256 哈希函数,然后进行模除以 13 。对于 C++,此函数应该位于 openssl 库中。请参阅此帖子。
如果您进行了大量的散列和查找,则重复执行模除操作是一项相当昂贵的操作。还有另一种将 n 位哈希函数映射到 i 位索引的方法。请参阅此帖子 Michael Mitzenmacher 介绍了如何使用 C 语言进行位移操作。希望有所帮助。
I did a quick check and using the SHA256 hash function and then doing modular division by 13 worked when I tried it in Mathematica. For c++ this function should be in the openssl library. See this post.
If you were doing a lot of hashing and lookup though, modular division is a pretty expensive operation to do repeatedly. There is another way of mapping an n-bit hash function into a i-bit indices. See this post by Michael Mitzenmacher about how to do it with a bit shift operation in C. Hope that helps.
尝试以下将您的 n 值映射到 0 到 12 之间的唯一索引
(1369%(n+1))%13
Try the following which maps your n values to unique indices between 0 and 12
(1369%(n+1))%13