我被告知一个标签是o(1)(忽略碰撞)。这是对我的解释,如下所示:
记忆范围是为哈希图保留的。在此范围内,钥匙是一个看似随机的地址。我们将键值对存储在该地址。每当发生这种碰撞时,或让每个使用地址存储一个指针到该地址的所有内容的链接列表时,都可以通过重新进行哈希湖来解决多个密钥可以哈希地址的可能性。
稍后可以进行键,以检查是否找到了匹配的键值对(在必要时重新将相同的碰撞分辨率重新解析)。如果找到键,则返回该值。
但是,如果将一系列内存分配给哈希图普,则有可能在那里模仿存在的键值对。因此,我认为必须在实例化(甚至更早的...?)上对哈希图的内存范围进行消毒。由于该范围不能比要存储的物品数量要小得多,因此卫生设施不是O(n)吗?现代硬件是否通过有任何指令来填充具有重复值的一系列内存来解决这一问题?如果是这样,该指令的出现使哈希图可行?否则,我不明白这是如何工作的。当然,此O(n)将是实例化的一次性事件。但是其他存储方法应该不能比O(log(n))更慢。请帮我,我想念什么?
I was taught that a hashmaps are O(1) (ignoring collisions). This was explained to me as follows:
A range in memory is reserved for the hashmap. A key is hashed to a seemingly random address in this range. We store the key-value pair at that address. The possibility that multiple keys can hash to the same address is solved either by rehashing the hash every time such a collision occurs or by having each utilized address store a pointer to a linked list of everything hashed to that address.
A key can later be hashed to check whether a matching key-value pair is found (reutilizing the same collision resolution used during storage if necessary). If the key is found, the value is returned.
But if a range of memory is assigned to a hashmap, there is a chance that the bits previously there would mimic a key-value pair being present. So I think the hashmap's memory range must be sanitized on instantiation (or even earlier...?). Since that range cannot be significantly smaller than the number of items to be stored, wouldn't that sanitation be O(n)? Does modern hardware solve this by having any instruction to fill a range of memory with a repeated value? If so, did the advent of this instruction make hashmaps viable? Otherwise, I do not understand how this works. Sure, this O(n) would be a one-time event on instantiation. But other storage methods should be able to never do anything slower than O(log(n)). Please help me, what am I missing?
发布评论
评论(1)
典型的hashmap实现将在构造时分配一张小表。然后,当项目数量超过表尺寸的一定恒定因素时,它将分配两倍的桌子,然后将所有键移至新键。
这会导致摊销的O(1)插入时间,但是当需要分配表时,实际的最坏情况插入O(n)。处理碰撞降低了“摊销” o(1)到“预期” o(1)时间的时间。
但是,可能的是,将密钥移到重新分配的表上,以逐步扩大该成本,以便每个操作确实是O(1)。但是,这很少是这样做的,因为它不能保证O(1)时间 - 再次存在碰撞问题,并且无法保证您甚至可以在恒定时间内分配内存。
这回答了您有关哈希地图的问题,但是您最初关于初始化成本的思维方式也不适用于理论上。请参阅此答案:
A typical HashMap implementation will allocate a small table when it's constructed. Then when the number of items exceeds some constant factor of the table size, it will allocate a table twice as big and move all the keys over to the new one.
This leads to an amortized O(1) insertion time, but an actual worst case insertion of O(n) when the table needs to be allocated. Dealing with collisions degrades that "amortized" O(1) time to "expected" O(1) time.
It's possible, however, to move the keys over to the reallocated table incrementally, spreading out that cost so that it really is O(1) per operation. This is rarely done, though, because it doesn't guarantee O(1) time -- there's that collision problem again, and there is no guarantee that you can even allocate memory in constant time.
That answers your question about hash maps, but your original line of thinking around the cost of initialization also does not theoretically apply. See this answer: Initializing an Array in the Context of Studying Data Structures