针对特定数据结构的无碰撞哈希函数
是否可以为具有特定属性的数据结构创建无冲突哈希函数。
- 数据结构是 int[][][]
- 它不包含重复项
- 定义了其中包含的整数范围。假设它是 0..1000,最大整数肯定不会大于 10000。
最大的问题是这个哈希函数也应该非常快。 有没有办法创建这样的哈希函数?也许在运行时取决于整数范围?
另外:我应该说这个哈希函数的目的是快速检查特定组合是否已被处理。因此,当处理数据结构中的某些数字组合时,我计算哈希值并存储它。然后,当处理数据结构中的另一个数字组合时,我将比较哈希值。
Is it possible to create collision free hash function for a data structure with specific properties.
- The datastructure is int[][][]
- It contains no duplicates
- The range of integers that are contained in it is defined. Let's say it's 0..1000, the maximal integer is definitely not greater than 10000.
Big problem is that this hash function should also be very fast.
Is there a way to create such a hash function? Maybe at run time depending on the integer range?
ADDITION: I should say that the purpose of this hash function is to quckily check if the particular combination was processed. So when some combination of numbers in the data structure is processed, I calculate the hash value and store it. Then when processing another combination of numbers within the data structure I will compare the hash values.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我认为你想要的是“完美哈希”甚至“最小完美哈希”:
http://en .wikipedia.org/wiki/Perfect_hash_function
编辑:也就是说,如果你确信并且确定你永远不会超过 [0...1000] 并且根据你需要做什么,你可能可以简单地“将结果直接存储在数组中。如果你没有很多元素,该数组将是稀疏的(因此有点浪费),但最多 1001 个元素来自 [0...1000] 一个 Object[1001] (或 int[1001] 或无论如何)可能会做。
I think what you want is a "perfect hash" or even a "minimal perfect hash":
http://en.wikipedia.org/wiki/Perfect_hash_function
Edit: That said, if you're sure and certain you'll never go above [0...1000] and depending on what you need to do you probably can simply "bucket" your results directly in an array. If you don't have many elements, that array would be sparse (and hence a bit of a waste) but for at most 1001 elements going from [0...1000] an Object[1001] (or int[1001] or whatever) will probably do.
如果您只使用 64 位值并将层次结构每一层中的位置存储到一个位段中会怎样?
类似的东西(在我的脑海中):
hash = (a << 34) | (b << 17) | (三)
what if you just use a 64-bit value and store the location in each level of the hierarchy into one section of bits?
something like(off the top of my head):
hash = (a << 34) | (b << 17) | (c)
完美的哈希可能不可行,因为为您的数据集找到一个完美的哈希可能需要大量的计算时间。
bool[][][]
是否适合您,其中true
表示某个 x,y,z 组合已被处理?下面是三维位数组的原型。由于 Int32 的限制,这最多只能工作到大约 1,024 的最大索引(但适合 128 MB)。通过创建 BitArray[][] 可以达到 10,000。但是,这对于该大小可能不切实际,因为它将占用超过 116 GB 的 RAM。根据您的具体问题大小和需求,普通的旧哈希表(有冲突)可能是您的最佳选择。也就是说,这是原型代码:
A perfect hash is likely not feasible, because it can take a lot of computation time to find one for your data set.
Would a
bool[][][]
work for you, wheretrue
means a certain x,y,z combination has been processed? Below is a prototype for a three-dimensional bit array. Because of the limits of an Int32, this will only work up to a maximum index of about 1,024 (but would fit within 128 MB). You could get to 10,000 by creating a BitArray[][]. However, this is probably not practical at that size, because it would occupy over 116 GB of RAM.Depending on your exact problem size and needs, a plain old hash table (with collisions) may be your best bet. That said, here is the prototype code: