选择哈希密钥类型的基本原理
伙计们,我有一个数据结构,它有 25 个不同的键(整数)和一个值。我有这些对象的列表(比如 50000),我打算使用哈希表来存储/检索它们。我计划采取其中一种方法。
根据这 25 个整数键创建一个整数哈希并将其存储在哈希表中。 (是的!我有一些方法来处理冲突)
在各个键上进行字符串连接,并将其用作哈希表的哈希键。例如,如果键值为 1,2,4,6,7,则哈希键将为“12467”。
假设我总共有 50000 条记录,每条记录都有 25 个不同的键和一个值,那么当涉及到检索和插入记录所需的字符串比较成本时,我的第二种方法会不会太过分了?
更多信息!
- 哈希表中的每个桶都是一棵平衡二叉树。
- 我正在使用 boost 库的 hash_combine 方法从 25 个键创建哈希。
Guys, I have a data structure which has 25 distinct keys (integer) and a value. I have a list of these objects (say 50000) and I intend to use a hash table to store/retrieve them. I am planning to take one of these approaches.
Create a integer hash from these 25 integer keys and store it on a hash table. (Yeah! I have some means to handle collisions)
Make a string concatenation on the individual keys and use it as a hash key for the hash table. For example, if the key values are 1,2,4,6,7 then the hash key would be "12467".
Assuming that I have a total of 50000 records each with 25 distinct keys and a value, then will my second approach be a overkill when it comes to the cost of string comparisons it needs to do to retrieve and insert a record?
Some more information!
- Each bucket in the hash table is a balanced binary tree.
- I am using the boost library's hash_combine method to create the hash from the 25 keys.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
绝对使用第一种方法,因为如果使用第二种方法,则需要一个具有
1x10^(25m) 的哈希表,其中 x 是可用键槽的最大长度
。例如,如果键的最大数量是 9999,则
m
将为 4,并且表中需要 1x10^100 个槽。解释:
哈希表背后的想法是,您可以以 O(1) 的效率(不考虑碰撞)随机访问任何元素,因为任何元素的哈希实际上就是它在哈希表中的位置。例如,如果我对对象 X 进行哈希处理并返回 24 的哈希值(或者转换为数字的某个字符串哈希值,结果是 24),我只需转到表的槽 24(通常实现为数组),并且可以检索对象 X。
但是,如果您使用第二种方法(连接 25 个数字 - 我们在这里用数字来简化事情 - 一起形成哈希),则最大的哈希将是 9999999999999999999999999。因此要检索该方法如果要从哈希表中获取对象,您必须从位置 9999999999999999999999999 检索它 - 这意味着您的表必须至少有那么多点。
请记住,对于第一个 - 由于您使用的是二叉树,因此冲突并不是什么大问题。最坏的情况是检索/插入效率为 O(log(n)),无论如何,这并不是那么糟糕。
Absolutely use the first method, because if you use the second , you will require a hash table which has
1x10^(25m), where x is the maximum length of a key
slots available.For example, if the maximum number a key can be is 9999,
m
would be 4 and you'd need 1x10^100 slots in your table.Explanation:
The idea behind a hash table is that you can randomly access any element with an efficiency of O(1) (collisions aside) because any element's hash is infact its position in the hash table. So for example, if I hash Object X and a hash of 24 is returned (or some string hash which is converted to a number, which turns out to be 24), I simply go to slot 24 of my table (often implemented as an array), and can retrieve Object X.
But if you were using your second method (concatenating 25 numbers - we'll say digits to simplify things here - together to make the hash), the largest hash would be 9999999999999999999999999. Therefore to retrieve that object from the hash table, you'd have to retrieve it from position 9999999999999999999999999 - which means your table must have at least that many spots.
And remember, with the first one - since you're using a binary tree, collisions won't really be that big a deal. Worst case scenario will be a retrieval/insertion efficiency of O(log(n)) which isn't really that bad anyways.