从线性探测转向二次探测(哈希碰撞)
我当前的哈希表实现是使用线性探测,现在我想转向二次探测(后来转向链接,也许还有双重哈希)。我读过一些文章、教程、维基百科等......但我仍然不知道我到底应该做什么。
基本上,线性探测的步长为 1,这很容易做到。当从哈希表中搜索、插入或删除元素时,我需要计算哈希值,为此,我这样做:
index = hash_function(key) % table_size;
然后,在搜索、插入或删除时,我循环遍历表,直到找到一个空闲的存储桶,如下所示
do {
if(/* CHECK IF IT'S THE ELEMENT WE WANT */) {
// FOUND ELEMENT
return;
} else {
index = (index + 1) % table_size;
}
while(/* LOOP UNTIL IT'S NECESSARY */);
:对于二次探测,我认为我需要做的是改变“索引”步长的计算方式,但这就是我不明白应该如何做。我看过各种代码,它们都有些不同。
另外,我已经看到了二次探测的一些实现,其中哈希函数被更改以适应这种情况(但不是全部)。是否真的需要进行这种更改,或者我可以避免修改哈希函数并仍然使用二次探测吗?
编辑: 在阅读了 Eli Bendersky 下面指出的所有内容后,我想我已经有了总体想法。这是http://eternallyconfuzzled.com/tuts/datastructs/jsw_tut_hashtable.aspx<的部分代码< /a>:
15 for ( step = 1; table->table[h] != EMPTY; step++ ) {
16 if ( compare ( key, table->table[h] ) == 0 )
17 return 1;
18
19 /* Move forward by quadratically, wrap if necessary */
20 h = ( h + ( step * step - step ) / 2 ) % table->size;
21 }
有两件事我不明白......他们说二次探测通常是使用c(i)=i^2
完成的。然而,在上面的代码中,它所做的事情更像是 c(i)=(i^2-i)/2
我准备在我的代码上实现这个,但我会简单地这样做:
index = (index + (index^index)) % table_size;
.. .而不是:
index = (index + (index^index - index)/2) % table_size;
如果有的话,我会这样做:
index = (index + (index^index)/2) % table_size;
...因为我已经看到其他代码示例下降了两倍。虽然我不明白为什么...
1)为什么要减去步数?
2) 为什么它会减少 2?
My current implementation of an Hash Table is using Linear Probing and now I want to move to Quadratic Probing (and later to chaining and maybe double hashing too). I've read a few articles, tutorials, wikipedia, etc... But I still don't know exactly what I should do.
Linear Probing, basically, has a step of 1 and that's easy to do. When searching, inserting or removing an element from the Hash Table, I need to calculate an hash and for that I do this:
index = hash_function(key) % table_size;
Then, while searching, inserting or removing I loop through the table until I find a free bucket, like this:
do {
if(/* CHECK IF IT'S THE ELEMENT WE WANT */) {
// FOUND ELEMENT
return;
} else {
index = (index + 1) % table_size;
}
while(/* LOOP UNTIL IT'S NECESSARY */);
As for Quadratic Probing, I think what I need to do is change how the "index" step size is calculated but that's what I don't understand how I should do it. I've seen various pieces of code, and all of them are somewhat different.
Also, I've seen some implementations of Quadratic Probing where the hash function is changed to accommodated that (but not all of them). Is that change really needed or can I avoid modifying the hash function and still use Quadratic Probing?
EDIT:
After reading everything pointed out by Eli Bendersky below I think I got the general idea. Here's part of the code at http://eternallyconfuzzled.com/tuts/datastructures/jsw_tut_hashtable.aspx:
15 for ( step = 1; table->table[h] != EMPTY; step++ ) {
16 if ( compare ( key, table->table[h] ) == 0 )
17 return 1;
18
19 /* Move forward by quadratically, wrap if necessary */
20 h = ( h + ( step * step - step ) / 2 ) % table->size;
21 }
There's 2 things I don't get... They say that quadratic probing is usually done using c(i)=i^2
. However, in the code above, it's doing something more like c(i)=(i^2-i)/2
I was ready to implement this on my code but I would simply do:
index = (index + (index^index)) % table_size;
...and not:
index = (index + (index^index - index)/2) % table_size;
If anything, I would do:
index = (index + (index^index)/2) % table_size;
...cause I've seen other code examples diving by two. Although I don't understand why...
1) Why is it subtracting the step?
2) Why is it diving it by 2?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果表大小是 2 的幂,则有一种特别简单而优雅的方法来实现二次探测:
不是查看原始索引的偏移量 0、1、2、3、4...,而是查看偏移量 0 , 1, 3, 6, 10...(第 ith 探头的偏移量为 (i*(i+1))/2,即它是二次的)。
这保证会命中哈希表中的每个位置(因此,如果有的话,您一定会找到一个空桶)假设表大小是 2 的幂。
这是证明的草图:
(如果表大小不是 2 的幂,则会在第 10 步失败。)
There is a particularly simple and elegant way to implement quadratic probing if your table size is a power of 2:
Instead of looking at offsets 0, 1, 2, 3, 4... from the original index, this will look at offsets 0, 1, 3, 6, 10... (the ith probe is at offset (i*(i+1))/2, i.e. it's quadratic).
This is guaranteed to hit every position in the hash table (so you are guaranteed to find an empty bucket if there is one) provided the table size is a power of 2.
Here is a sketch of a proof:
(If the table size is not a power of 2, this falls apart at step 10.)
您不必修改二次探测的哈希函数。二次探测的最简单形式实际上只是将后续平方添加到计算位置,而不是线性 1、2、3。
有一个很好的资源 此处。以下内容摘自那里。这是使用简单多项式
c(i) = i^2
时最简单的二次探测形式:在更一般的情况下,公式为:
您可以选择常量。
但请记住,二次探测仅在某些情况下有用。正如维基百科条目所述:
编辑:像计算机科学中的许多事情一样,二次探测的精确常数和多项式是启发式的。是的,最简单的形式是
i^2
,但您可以选择任何其他多项式。维基百科给出了h(k,i) = (h(k) + i + i^2)(mod m)
的示例。因此,很难回答你的“为什么”问题。这里唯一的“为什么”是为什么你需要二次探测?对于其他形式的探测和获取聚簇表有问题吗?或者只是一项家庭作业,或者自学?
请记住,到目前为止,哈希表最常见的冲突解决技术是链接或线性探测。二次探测是一种适用于特殊情况的启发式选项,除非您非常清楚自己在做什么,否则我不建议使用它。
You don't have to modify the hash function for quadratic probing. The simplest form of quadratic probing is really just adding consequent squares to the calculated position instead of linear 1, 2, 3.
There's a good resource here. The following is taken from there. This is the simplest form of quadratic probing when the simple polynomial
c(i) = i^2
is used:In the more general case the formula is:
And you can pick your constants.
Keep, in mind, however, that quadratic probing is useful only in certain cases. As the Wikipedia entry states:
EDIT: Like many things in computer science, the exact constants and polynomials of quadratic probing are heuristic. Yes, the simplest form is
i^2
, but you may choose any other polynomial. Wikipedia gives the example withh(k,i) = (h(k) + i + i^2)(mod m)
.Therefore, it is difficult to answer your "why" question. The only "why" here is why do you need quadratic probing at all? Having problems with other forms of probing and getting a clustered table? Or is it just a homework assignment, or self-learning?
Keep in mind that by far the most common collision resolution technique for hash tables is either chaining or linear probing. Quadratic probing is a heuristic option available for special cases, and unless you know what you're doing very well, I wouldn't recommend using it.