需要帮助开始构建我自己的哈希表
我正在学习哈希表,试图了解它们是如何工作的。我想制作一个相当简单的具有单独链接的哈希表(使用数组中的列表)。我有几个问题:
假设密钥可以是任何类型,我会要求用户实现哈希函数,对吧?这可以避免吗?
用户还需要在初始化时提供包含列表(用于冲突)的数组的长度,对吗?这可以避免吗?
如果您有任何其他提示,或者可能是哈希表的 C++ 中的一些清晰的代码示例,我将不胜感激。
感谢您的帮助。
I'm learning about hash tables, trying to understand how they work. I'd like to make a rather simple hash table with separate chaining (using lists in an array). I have a few questions:
Assuming the keys can be of any type, I would require the user to implement the hashing function, right? Can this be avoided?
The user also needs to supply the length of the array that contains the lists (for collisions) at initialisation, correct? Can this be avoided?
If you have any other tips, or maybe some clear code samples in C++ of a hash table I would be thankful.
Thanks for your help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
通常,是的,您需要客户端指定哈希函数,因为如果您正在编写通用哈希表并在任意类型 T 上进行操作,您将不知道如何以语义上有意义的方式对其进行哈希处理。您可以通过对所存储元素的类型和哈希函数的类进行参数化来实现此目的。例如:
在这里,您可以使用模板的默认参数来选择默认的哈希函数,除非用户另有指定。
虽然客户端通常可以指定表的初始大小,但这不是必需的。您可以对存储桶的数量进行有根据的猜测(例如,最初使用 17 个存储桶),并随着负载系数的增加而增长表。这类似于 std::vector 的工作原理:实现可以选择默认大小,但如果客户端显式请求预设大小的向量或调用 Reserve ,该实现接受用户的提示。例如,您可以有一个以下形式的构造函数:
这样,客户端就可以构造一个具有默认存储桶数量的哈希表,或者如果他们了解他们想要的存储桶数量,则可以将其指定为范围。但是,您可能还希望将存储桶隐藏为详细信息,并让客户端指定他们希望放入表中的元素数量,然后让您的类为此在幕后进行计算。这使得在幕后切换实现变得更容易,因此,如果您想使用动态完美哈希表之类的东西而不是链式哈希,则该类可以处理计算初始大小的复杂性。
至于代码示例,我不确定如何在不放弃构建哈希表所涉及的大量复杂性的情况下提供任何代码示例。 :-) 如果您对某段特定代码感兴趣,但自己编写时遇到困难,请随时将其作为单独的问题发布,以便获得更有针对性的反馈。
希望这有帮助!
Typically, yes, you would need the client to specify the hash function, since if you are writing a generic hash table and are operating on an arbitrary type T, you can't know how to hash it in a way that is semantically meaningful. You can do this by parameterizing the class on both the type of the elements being stored and the hash function. For example:
Here, you can use default arguments with the templates to select the default hash function unless the user specifies otherwise.
While the client typically can specify the initial size of the table, it's not required. You could make an educated guess about the number of buckets (say, initially use 17 buckets), growing the table as the load factor increases. This is similar, say, to how
std::vector
works: the implementation can pick a default size, but if the client either explicitly asks for a presized vector or callsreserve
, the implementation takes the hint from the user. For example, you could have a constructor of the formThis way, the client can just construct a hash table with a default number of buckets, or if they have a sense of the number of buckets they'd like they can specify it as a parameter. However, you might also want to hide the buckets as a detail and just have the client specify how many elements they're expecting to put into the table, then have your class do a computation behind the scenes for this. This makes it easier to switch implementations behind-the-scenes, so that if you want to use something like a dynamic perfect hash table instead of chained hashing the class can handle the complexity of computing an initial size.
As for code examples, I'm not sure how to provide any without giving away a lot of the complexity involved in building the hash table. :-) If there's a specific piece of code you're interested in and are having trouble writing on your own, feel free to post it as a separate question so that you can get more targeted feedback.
Hope this helps!
查看哈希算法的各种实现。有很多可用的。如果需要,您可以使哈希存储可变。由你决定。对于大多数哈希算法,可能会发生冲突。请记住这一点。
Look at various implementations of hashing algorithms. There are many available. You can have the hash storage be variable if you want. It's up to you. With most hashing algorithms, collisions can be possible. Keep this in mind.
如果您想了解哈希表,我建议您从最简单的开始,即线性链接哈希表
在这里您考虑一个固定大小的指针数组(指向您定义的类型项目的指针),您的项目将存储在其中,考虑还有一些计数器,一个用于跟踪数组的长度,另一个用于跟踪添加到表中的项目数,直到某一时刻
您可以使用宏来定义向表添加项目的功能像这样的哈希表
,其中 k 是要添加的项目的索引(键),M 是一个质数,表示指针数组的大小(例如 13、29、41、543),h(k) 将为您提供项目将要到达的位置被插入。
您应该考虑使用固定大小的指针数组的原因是因为这样会产生
成本。
因此,从这个简单的函数开始,您可以修改它并考虑实现一些更复杂的函数,例如开放寻址、线性探测、二次探测和双散列
至于用户请求的有关项目数量的输入,我完全同意 templatetypedef 之前所说的
If you are trying to learn about hash tables I suggest you to start with the most simple ones wich is the linear chaining hash tables
Here you consider an fixed size array of pointers(pointers to type item defined by you) where your items will be stored,consider also having some counters, one to keep track of the length of the array and another one to keep track of the number of items added to the table until a certain moment
you could use a macro to define the function of adding items to the hash table like this
where k is the index(key) of the item to be added and M is a prime number rappresenting the size of the array of pointers (eg 13, 29, 41, 543) and h(k) will give you the position where the item is going to be inserted.
The reason you should consider using an fixed size array of pointers is because this way you have
cost.
Therefore starting from this simple function you can then modify it and consider implementing some more complex ones like open adressing, linear probing, quadratic probing and double hashing
As for the user requested input regarding the number of items i totally agree with what templatetypedef said earlier