对于未知输入具有良好一致性的哈希函数
我正在寻找一个可以对大量输入进行分区的哈希函数 数据对于少量分区(例如 100 或 256)。这意味着我预计会发生很多碰撞,但我并不关心碰撞。
输入数据事先未知。我期望字符串具有一定的长度 可能在 6 到 100 字节之间。字符串的分布可能非常糟糕 (例如,很大一部分充满空格或仅包含数字)。
CRC 算法是最先浮现在脑海中的想法之一。 CRC8 已被提议,但没有提供有关其的信息 均匀性;对于 CRC32 显然一致性不太好。
Bob Jenkins 有一篇关于返回 a 的哈希函数的完整文章 32 位值。我想对于均匀分布的 32 位值 所有可能的 8 位子集也应该均匀分布,所以有 是很好的候选人。但也许将 32 位值减少到 8 位值是否有更简单的 8 位算法?
I'm looking for a hash function that partitions a large set of input
data with good uniformity to a small number of partitions (say 100 or
256). That means I expect a lot of collisions and I don't care about collisions.
The input data is not known in advance. I expect strings with a length
between maybe 6 and 100 bytes. The strings may be very badly distributed
(e.g. a large part filled with spaces or containing only digits).
CRC algorithms is one of the first ideas that springs into mind.
CRC8 has been proposed, but without giving information about its
uniformity; for CRC32 apparently the uniformity is not that good.
There are lists of simple or general purpose hash functions,
but without telling about their uniformity.
Bob Jenkins has a thorough article on hash functions that return a
32 bit value. I suppose that for a uniformly distributed 32 bit value
also all possible 8 bit subsets should be evenly distributed, so there
are good candidates. But maybe it's overkill to reduce a 32 bit value to
a 8 bit value if there are simpler algorithms for 8 bits?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我发现 sdbm 算法表现出良好的一致性,而且非常简单:
I found the sdbm algorithm to show good uniformity, being quite simple: