当前位置：文江博客话题详情

设计一个哈希表

发布于 2024-10-26 13:01:31 字数 298 浏览 0 评论 0原文

我在面试中被问到这个问题，并被难住了，尽管我想出了一个答案，但我对我的解决方案感到不满意。我想看看这里的专家对这个问题有何看法。

我完全引用了面试官提出的问题。 “设计一个哈希表，您可以使用任何您想要的数据结构。我想看看您如何实现 O(1) 查找时间”。最后他说这更像是通过另一个数据结构模拟哈希表。

谁能给我提供有关这个问题的更多信息。谢谢！

PS：我提出这个问题的主要原因是想知道专业设计师如何开始解决这个问题的设计&&另一件事是，我根据提出的其他问题以某种方式通过了面试，但这个问题在我的脑海中，我想找到答案！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

相思碎 2024-11-02 13:04:17

哈希表提供了一种高效（通常在常数/O(1) 时间内）插入和检索数据的方法。为此，我们使用一个非常大的数组来存储目标值和一个哈希函数，该函数通常将目标值映射到哈希值中，哈希值只不过是这个大数组中的有效索引。将要存储到唯一键（或表中的索引）的值完美散列的散列函数称为完美散列函数。但在实践中，为了存储这些没有已知方法来获取唯一哈希值（表中的索引）的值，我们通常使用哈希函数，该函数可以将每个值映射到特定索引，以便将冲突保持在最低限度。这里的冲突是指哈希表中存储的两个或多个项映射到相同的哈希值。

现在回到原来的问题，即：
“设计一个哈希表，您可以使用任何您想要的数据结构。我想看看您如何实现 O(1) 查找时间”。最后他说，这更像是通过另一个数据结构来模拟哈希表。“

如果我们可以设计一个完美的哈希函数，那么查找就可以在 O(1) 时间内完成。底层的数据结构仍然是一个数组。但这取决于要存储的值，例如，考虑字符串到英文字母，因为没有已知的哈希函数可以将每个有效的英文单词映射到唯一的 int （32 位）。）（或 long long int 64 位）值，因此总会存在一些冲突，为了处理冲突，我们可以使用单独的链接方法进行冲突处理，其中每个哈希表槽存储一个指向链表的指针，该链表实际上存储了所有内容。例如，考虑一个哈希函数，该函数将每个英文字母字符串视为基于 26 的数字（因为英文字母中有 26 个字符），这可以编码为：

unsigned int hash(const std::string& word)
{
    std::transform(word.begin(), word.end(), word.begin(), ::tolower);
    unsigned int key=0;
    for(int i=0;i<word.length();++i)
    {
         key = (key<<4) + (key<<3)+(key<<2) + word[i];
         key = key% tableSize;
    }
    return key;
}

其中 tableSize 是一个适当的值。选择的素数刚好大于要存储在哈希表中的英语词典单词的总数。

以下是大小为 144554 的字典和大小 = 144563 的表的结果：

[映射到同一单元格的项目 -->哈希表中此类槽的数量] =======>

[ 0  -->   53278 ]
[1 --> 52962 ]
[2 --> 26833 ]
[3 --> 8653  ]
[4 --> 2313 ]
[5 --> 437 ]
[6  --> 78 ]
[7  -->  9 ]

在这种情况下，要搜索已映射到仅包含一项的单元格的项，查找时间复杂度为 O(1)，但如果它映射到包含超过 1 个项的单元格，则我们必须迭代此链接列表可能包含 2 到 7 个节点，然后我们就能找到该元素。所以在这种情况下它不是恒定的。

因此，是否可以在 O(1) 约束下执行查找，仅取决于完美哈希函数的可用性。否则，它不会完全是 O(1)，但非常接近它。

A hash table provides a way to insert and retrieve data efficiently (usually in constant/O(1)) time. For this we use an very large array to store the the target values and a hash function which usually maps the target values, into hash values which is nothing else but the valid indices in this large array. A hash function which perfectly hashes a values to be stored into a unique key (or index in the table) is known as a perfect hash function. But in practice to store such values for which there is no known way to obtain unique hash values (indices in the table) we usually use a hash function which can map each value to particular index so that collision can be kept minimum. Here collision means that two or more items to be stored in the hash table map to the same hash value.

Now coming at the original questions, which is:
"Design a Hash-table, You can use any data-structure you can want. I would like to see how you implement the O(1) look up time". Finally he said It's more like simulating a Hash-table via another Data-structure."

Look up is possible in exactly O(1) time, in case we can design a perfect hash function. The underlying data-structure is still an array. But it depends upon the values to be stored, whether we can design a perfect hash function or not. For example consider strings to English alphabet. Since there is no known hash function which can map each valid English word to a unique int (32 bit) (or long long int 64 bit) value, so there will always be some collisions. To deal with collision we can use separate chaining method of collision handling in which each hash table slot stores a pointer to the linked list, which actually stores all the item hashing to that particular slot or index. For example consider a hash function which considers each English alphabet string as a number on base 26 (because there are 26 characters in English alphabet), This can be coded as:

unsigned int hash(const std::string& word)
{
    std::transform(word.begin(), word.end(), word.begin(), ::tolower);
    unsigned int key=0;
    for(int i=0;i<word.length();++i)
    {
         key = (key<<4) + (key<<3)+(key<<2) + word[i];
         key = key% tableSize;
    }
    return key;
}

Where tableSize is an appropriately chosen prime number just greater than the total number of English dictionary words targeted to be stored in the hash table.

Following are the results with dictionary of size 144554, and table of size = 144563:

[Items mapping to same cell --> Number of such slots in the hash table ] =======>

[ 0  -->   53278 ]
[1 --> 52962 ]
[2 --> 26833 ]
[3 --> 8653  ]
[4 --> 2313 ]
[5 --> 437 ]
[6  --> 78 ]
[7  -->  9 ]

In this case to search the items which have been mapped to cells containing only one item, the lookup will be O(1), but in case it maps to a cell which has more than 1 items, then we have to iterate through this linked list which might contain 2 to 7 nodes and then we will be able to find out that element. So its not constant in this case.

So it depends upon the availability of perfect hash function only, whether we the lookup can be performed in O(1) constraint. Otherwise it will not be exactly O(1) but very close to it.

回复收藏 0 原文