设计一个哈希表
我在面试中被问到这个问题,并被难住了,尽管我想出了一个答案,但我对我的解决方案感到不满意。我想看看这里的专家对这个问题有何看法。
我完全引用了面试官提出的问题。 “设计一个哈希表,您可以使用任何您想要的数据结构。我想看看您如何实现 O(1) 查找时间”。最后他说这更像是通过另一个数据结构模拟哈希表。
谁能给我提供有关这个问题的更多信息。谢谢!
PS:我提出这个问题的主要原因是想知道专业设计师如何开始解决这个问题的设计&&另一件事是,我根据提出的其他问题以某种方式通过了面试,但这个问题在我的脑海中,我想找到答案!
I was asked this question in an Interview and was left stumped, even though i came up with an answer I didn't feel comfortable with my solution. I wanted to see how experts here feel about this question.
I am exactly quoting the question as it came out of the Interviewer. "Design a Hash-table, You can use any data-structure you can want. I would like to see how you implement the O(1) look up time". Finally he said It's more like simulating a Hash-table via another Data-structure.
Can anyone light me with more information on this question. Thanks!
PS: Main reason for me putting this question is to know how an expert designer would start off with the Design for this problem && one more thing I cleared the interview somehow based on the other questions that were asked but this question was in my mind and I wanted to find out the answer!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
哈希表提供了一种高效(通常在常数/O(1) 时间内)插入和检索数据的方法。为此,我们使用一个非常大的数组来存储目标值和一个哈希函数,该函数通常将目标值映射到哈希值中,哈希值只不过是这个大数组中的有效索引。将要存储到唯一键(或表中的索引)的值完美散列的散列函数称为完美散列函数。但在实践中,为了存储这些没有已知方法来获取唯一哈希值(表中的索引)的值,我们通常使用哈希函数,该函数可以将每个值映射到特定索引,以便将冲突保持在最低限度。这里的冲突是指哈希表中存储的两个或多个项映射到相同的哈希值。
现在回到原来的问题,即:
“设计一个哈希表,您可以使用任何您想要的数据结构。我想看看您如何实现 O(1) 查找时间”。最后他说,这更像是通过另一个数据结构来模拟哈希表。“
如果我们可以设计一个完美的哈希函数,那么查找就可以在 O(1) 时间内完成。底层的数据结构仍然是一个数组。但这取决于要存储的值,例如,考虑字符串到英文字母,因为没有已知的哈希函数可以将每个有效的英文单词映射到唯一的 int (32 位)。 )(或 long long int 64 位)值,因此总会存在一些冲突,为了处理冲突,我们可以使用单独的链接方法进行冲突处理,其中每个哈希表槽存储一个指向链表的指针,该链表实际上存储了所有内容。例如,考虑一个哈希函数,该函数将每个英文字母字符串视为基于 26 的数字(因为英文字母中有 26 个字符),这可以编码为:
其中 tableSize 是一个适当的值。选择的素数刚好大于要存储在哈希表中的英语词典单词的总数。
以下是大小为 144554 的字典和大小 = 144563 的表的结果:
[映射到同一单元格的项目 -->哈希表中此类槽的数量] =======>
在这种情况下,要搜索已映射到仅包含一项的单元格的项,查找时间复杂度为 O(1),但如果它映射到包含超过 1 个项的单元格,则我们必须迭代此链接列表可能包含 2 到 7 个节点,然后我们就能找到该元素。所以在这种情况下它不是恒定的。
因此,是否可以在 O(1) 约束下执行查找,仅取决于完美哈希函数的可用性。否则,它不会完全是 O(1),但非常接近它。
A hash table provides a way to insert and retrieve data efficiently (usually in constant/O(1)) time. For this we use an very large array to store the the target values and a hash function which usually maps the target values, into hash values which is nothing else but the valid indices in this large array. A hash function which perfectly hashes a values to be stored into a unique key (or index in the table) is known as a perfect hash function. But in practice to store such values for which there is no known way to obtain unique hash values (indices in the table) we usually use a hash function which can map each value to particular index so that collision can be kept minimum. Here collision means that two or more items to be stored in the hash table map to the same hash value.
Now coming at the original questions, which is:
"Design a Hash-table, You can use any data-structure you can want. I would like to see how you implement the O(1) look up time". Finally he said It's more like simulating a Hash-table via another Data-structure."
Look up is possible in exactly O(1) time, in case we can design a perfect hash function. The underlying data-structure is still an array. But it depends upon the values to be stored, whether we can design a perfect hash function or not. For example consider strings to English alphabet. Since there is no known hash function which can map each valid English word to a unique int (32 bit) (or long long int 64 bit) value, so there will always be some collisions. To deal with collision we can use separate chaining method of collision handling in which each hash table slot stores a pointer to the linked list, which actually stores all the item hashing to that particular slot or index. For example consider a hash function which considers each English alphabet string as a number on base 26 (because there are 26 characters in English alphabet), This can be coded as:
Where tableSize is an appropriately chosen prime number just greater than the total number of English dictionary words targeted to be stored in the hash table.
Following are the results with dictionary of size 144554, and table of size = 144563:
[Items mapping to same cell --> Number of such slots in the hash table ] =======>
In this case to search the items which have been mapped to cells containing only one item, the lookup will be O(1), but in case it maps to a cell which has more than 1 items, then we have to iterate through this linked list which might contain 2 to 7 nodes and then we will be able to find out that element. So its not constant in this case.
So it depends upon the availability of perfect hash function only, whether we the lookup can be performed in O(1) constraint. Otherwise it will not be exactly O(1) but very close to it.
考虑宇宙U(例如,所有可能的IP地址、或所有可能的名称、或所有可能的移动电话号码、或所有可能的棋盘配置)。您可能已经注意到宇宙 U 非常大。
集合S的大小是合理的S⊆U。所以,这个集合S的大小是合理的,就像你保存朋友的电话号码一样。
选择要实施的数据结构
没有数据结构,我们将无法得到好的解决方案。我们可以使用数组来快速插入、删除和查找,但是它会占用大量空间,因为宇宙的大小非常大。另外,你的朋友名字应该是整数,并且空间需求与宇宙成正比。
另一方面,我们可以使用链表。这只会占用与对象(即集合 S)一样多的空间,但 3 个操作不会是 O(1)。为了解决这个问题,我们可以同时使用两者。
因此,解决方案是利用两全其美的方法,即数组的快速查找和链接列表等小型存储。
但是,这些现实世界的实体需要通过哈希函数转换为整数,以便它们可以用作数组索引。所以,假设你想保存你朋友的名字 alice,只需将他的名字转换为整数
插入爱丽丝:
<代码>
int k = hashFunc(alice);
arr[k] = Alice //这需要 O(1) 时间
查找 alice:
<代码>
int k = hashFunc(alice);
字符串名称 = arr[k] ;
print name;//打印 alice
当然事情没那么简单,但这就是我现在可以解释的。有不清楚的地方请告诉我。谢谢。有关哈希表的更多信息,请参阅此处
Consider the Universe U (e.g. all possible IP address, or all possible names or all possible mobile numbers or all possible chess board configuration). You might have noticed that universe U is very large.
Set S is of reasonable size S⊆ U. So, this set S is of reasonable size, like you keeping phone number of your friends.
Selecting data-structure for implementation
Without data-structure, we will not get good solution. We could use an array for quick insert, deletion, and lookup, but it will take up a lot of space,as the size of universe is very large. Also, your friend name should be integer and space requirement is proportional to universe.
On the other hand, we could use a linked list. This would only take up as much as space as there are objects i.e. Set S, but the 3 operations would not be O(1). To resolve this, we can use both.
So, the solution is to use the best of both worlds, i.e. fast lookup of arrays and small storage of size like link list.
But, these real world entities needs to be changed to integer, by something called hash function, so that they can be used as array index. So, suppose you want to save your friend's name alice, just convert his name to integer
Inserting alice:
int k = hashFunc(alice);
arr[k] = Alice //this takes O(1) time
Lookup for alice:
int k = hashFunc(alice);
string name = arr[k] ;
print name;//prints alice
Of-course it is not that simple, but this is what I can explain right now. Please let me know wherever I am not clear.Thanks. For more information on hash table refer here
如果我处于你的立场,我应该做以下事情:
If I would have been in your shoes, I should have done the following:
使用数组 => O(1)
因此,您可以使用哈希函数将密钥转换为数字,然后使用该数字作为数组的索引来检索该值。
Use an array => O(1)
So you'd use a hash function to turn your key into a number, then use that number as an index into an array to retrieve the value.
您需要一个列表数组或“桶”来存储您的值。然后使用哈希函数来确定要查找的数组元素,最后对那里的列表元素进行线性搜索。
您可以不断查找数组位置,并对小列表中的哈希值进行线性搜索。
You need an array of lists, or "buckets" for your values. Then you use a hash function to determine which array element to look in, and finally do a linear search through the list elements there.
You have constant lookup of the array location, and linear search of the hash values in the small list there.
这是一个相当标准的面试问题,表明您了解有用的 Java 数据结构的基本概念,例如 HashSet 和 HashMap。
您将使用一系列列表,这些列表通常称为存储桶。您以给定容量 n 开始哈希表,这意味着您有一个包含 10 个列表的数组(全部为空)。
要将一个对象添加到你的 hastable 中,你可以调用对象
hashCode
函数,它会为你提供一个 int (一个相当大范围内的数字)。因此,您必须将 hashCode 取模到 n 才能得到它所在的存储桶。将该对象添加到该存储桶中列表的末尾。要查找对象,您需要再次使用 hashCode 和 mod 函数来查找存储桶,然后需要使用
.equals()
迭代列表以查找正确的对象。随着表变得越来越满,您最终将进行越来越多的线性搜索,因此您最终将需要重新散列。这意味着建立一个全新的、更大的桌子并将对象再次放入其中。
如果您想要的存储桶已满,您可以重新计算不同的存储桶位置,而不是在每个数组位置中使用列表,一种常见的方法是 二次探测。这样做的优点是不需要任何动态数据结构(例如列表),但更复杂。
It's a fairly standard interview question that shows you understand the underlying concepts being useful Java data structures, like HashSets and HashMaps.
You would use an array of lists, these are normally referred to as buckets. You start your hashtable with a given capacity n meaning you have a array of 10 lists (all empty).
To add an object to your hastable you call the objects
hashCode
function which gives you an int (a number in a pretty big range). So you then have to modulo the hashCode wrt to n to give you the bucket it lives in. Add the object to the end of the list in that bucket.To find an object you again use the hashCode and mod function to find the bucket and then need to iterate through the list using
.equals()
to find the correct object.As the table gets fuller, you will end up doing more and more linear searching, so you will eventually need to re-hash. This means building an entirely new, larger table and putting the objects into it again.
Instead of using a List in each array position you can recalulate a different bucket position if the one you want is full, a common method is quadratic probing. This has the advantage of not needed any dynamic data structures like lists but is more complicated.