如何创建复杂度为 O(1) 的集合

发布于 2024-09-11 18:07:24 字数 176 浏览 5 评论 0原文

我想创建一个数据结构或集合,其添加、删除和计算的复杂度为 O(1)。的元素。我该怎么开始呢?

我想到了一种解决方案:我将使用哈希表,对于插入的每个键/值对,我将只有一个哈希码,即:我的哈希码算法每次都会生成唯一的哈希值,因此索引处的索引存储的值将是唯一的(即没有冲突)。

这会给我带来 O(1) 复杂度吗?

I would like to create a data structure or collection which will have O(1) complexity in adding, removing and calculating no. of elements. How am I supposed to start?

I have thought of a solution: I will use a Hashtable and for each key / value pair inserted, I will have only one hash code, that is: my hash code algorithm will generate a unique hash value every time, so the index at which the value is stored will be unique (i.e. no collisions).

Will that give me O(1) complexity?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

不一样的天空 2024-09-18 18:07:24

是的,这可行,但正如您提到的,您的散列函数需要 100% 唯一。任何重复都会导致您必须使用某种冲突解决方案。我会推荐线性链接。

编辑: Hashmap.size() 允许 O(1) 访问

编辑 2: 回应 Larry 造成的混乱 =P

是的,散列是 O(k),其中 k是密钥长度。每个人都可以同意这一点。然而,如果你没有完美的哈希,你根本无法获得 O(1) 时间。您的主张是您不需要唯一性即可实现 O(1) 删除特定元素。我向你保证那是错误的。

考虑最坏的情况:每个元素都散列到相同的东西。你最终得到一个链表,众所周知,它没有 O(1) 删除操作。正如您所提到的,我希望没有人愚蠢到可以制作这样的哈希值。

要点是,哈希的唯一性是 O(1) 运行时间的先决条件。

但即便如此,从技术上讲,它也不是 O(1) Big O 效率。只有使用摊销分析,您才能在最坏的情况下获得恒定的时间效率。正如维基百科关于摊销分析的文章所述

基本思想是,最坏情况操作可以改变状态,使最坏情况在很长一段时间内不会再次发生,从而“摊销”其成本。

这是指在某些负载因子下调整哈希表大小(改变数据结构的状态)可以确保较小的冲突机会等的想法。

我希望这可以解决所有问题。

Yes that will work, but as you mentioned your hashing function needs to be 100% unique. Any duplicates will result in you having to use some sort of conflict resolution. I would recommend linear chaining.

edit: Hashmap.size() allows for O(1) access

edit 2: Respopnse to the confusion Larry has caused =P

Yes, Hashing is O(k) where k is the keylength. Everyone can agree on that. However, if you do not have a perfect hash, you simply cannot get O(1) time. Your claim was that you do not need uniqueness to acheive O(1) deletion of a specific element. I guarantee you that is wrong.

Consider a worst case scenario: every element hashes to the same thing. You end up with a single linked list which as everyone knows does not have O(1) deletion. I would hope, as you mentioned, nobody is dumb enough to make a hash like this.

Point is, uniqueness of the hash is a prerequisite for O(1) runtime.

Even then, though, it is technically not O(1) Big O efficiency. Only using amortized analysis you will acheive constant time efficiency in the worst case. As noted on wikipedia's article on amortized analysis

The basic idea is that a worst case operation can alter the state in such a way that the worst case cannot occur again for a long time, thus "amortizing" its cost.

That is referring to the idea that resizing your hashtable (altering the state of your data structure) at certain load factors can ensure a smaller chance of collisions etc.

I hope this clears everything up.

情痴 2024-09-18 18:07:24

添加、删除和大小(如果使用简单的计数器单独跟踪)可以通过链接列表提供。除非您需要删除特定项目。您应该更具体地说明您的要求。

Adding, Removing and Size (provided it is tracked separately, using a simple counter) can be provided by a linked list. Unless you need to remove a specific item. You should be more specific about your requirements.

够运 2024-09-18 18:07:24

即使您确切地知道被散列的事物的空间,执行完全不冲突的散列函数也是相当棘手的,而且一般来说这是不可能的。它还很大程度上取决于您要散列到的数组的大小。也就是说,您需要确切地知道您正在做什么才能使其发挥作用。

但是,如果您稍微放松一点,以便相同的哈希代码并不意味着相等1,那么您可以将现有的 Java HashMap 框架用于所有其他部分。您所需要做的就是在关键类中插入您自己的 hashCode() 实现,这是 Java 一直支持的。并确保您也正确定义了平等。此时,各种操作的成本并不比 O(1) 贵多少,特别是如果您对容量和负载系数有了良好的初始估计。

1 当然,相等必须意味着相等的哈希码。

Doing a totally non-clashing hash function is quite tricky even when you know exactly the space of things being hashed, and it's impossible in general. It also depends deeply on the size of the array that you're hashing into. That is, you need to know exactly what you're doing to make that work.

But if you instead relax that a bit so that identical hash codes don't imply equality1, then you can use the existing Java HashMap framework for all the other parts. All you need to do is to plug in your own hashCode() implementation in your key class, which is something that Java has always supported. And make sure that you've got equality defined right too. At that point, you've got the various operations being not much more expensive than O(1), especially if you've got a good initial estimation for the capacity and load factor.

1 Equality must imply equal hash codes, of course.

独自唱情﹋歌 2024-09-18 18:07:24

即使您的哈希码是唯一的,也不能保证集合无冲突。这是因为您的哈希图的大小不是无限的。哈希码必须减少到哈希映射中的存储桶数量,在减少之后,您仍然可能会发生冲突。

例如,假设我有三个对象 A(散列:2)、B(散列:18)、C(散列:66)都是唯一的。
假设您将它们放入容量为 16(默认值)的 HashMap 中。如果它们被映射到具有 % 16 的存储桶(实际上比这更复杂),在减少哈希码后,我们现在有 A(哈希:2 % 16 = 2)、B(哈希:18 % 16 = 2)、C( hash: 66 % 16 = 2)

HashMap 可能比 Hashtable 更快,除非你需要线程安全。 (在这种情况下我建议你使用 CopncurrentHashMap)
恕我直言,Hashtable 已经成为一个遗留集合 12 年了,我建议你只在必要时使用它。

Even if your hashcodes are unique this doesn't guarentee a collision free collection. This is because your hash map is not of an unlimited size. The hashcode has to be reduced to the number of buckets in your hash map and after this reduction you can still get collisions.

e.g. Say I have three objects A (hash: 2), B (hash: 18), C (hash: 66) All unique.
Say you put them in a HashMap of with a capacity of 16 (the default). If they were mapped to a bucket with % 16 (actually is more complex that this) after reducing the hash codes we now have A (hash: 2 % 16 = 2), B (hash: 18 % 16 = 2), C (hash: 66 % 16 = 2)

HashMap is likely to be faster than Hashtable, unless you need thread safety. (In which case I suggest you use CopncurrentHashMap)
IMHO, Hashtable has been a legacy collection for 12 years, and I would suggest you only use it if you have to.

埋葬我深情 2024-09-18 18:07:24

您需要哪些功能是链表无法提供的?

What functionality do you need that a linked list won't give you?

偏闹i 2024-09-18 18:07:24

令人惊讶的是,如果您提前知道要放入集合中的所有密钥,您的想法将会奏效。这个想法是生成一个特殊哈希函数,它将每个键映射到(1, n) 范围内的唯一值。那么我们的“哈希表”只是一个简单的数组(+一个用于缓存元素数量的整数)

实现这一点并不简单,但也不是火箭科学。我将把它留给 Steve Hanov 来解释细节,因为他给出了比我更好的解释。

Surprisingly, your idea will work, if you know all the keys you want to put in the collection in advance. The idea is to generate a special hash function which maps each key to a unique value in the range (1, n). Then our "hash table" is just a simple array (+ an integer to cache the number of elements)

Implementing this is not trivial, but it's not rocket science either. I'll leave it to Steve Hanov to explain the ins-and-outs, as he gives a much better explanation than I ever could.

梦里的微风 2024-09-18 18:07:24

这很简单。只需使用哈希图即可。你不需要做任何特别的事情。 Hashmap本身的插入、删除、计算元素数量都是O(1)。

即使键不是唯一的,只要集合变得太大时 Hashmap 会自动扩展大小(大多数实现会自动为您执行此操作),该算法仍然是 O(1)。

所以,只要根据给定的文档使用 Hash map 就可以了。不要想出更复杂的事情,那只是浪费时间。

使用哈希来避免冲突确实是不可能的。如果可能的话,那么它基本上只是一个数组或到数组的映射,而不是哈希。但没有必要避免碰撞,有碰撞仍然是 O(1)。

It's simple. Just use a hash map. You don't need to do anything special. Hashmap itself is O(1) for insertion, deletion, calculating number of elements.

Even if the keys are not unique, the algorithm will still be O(1) as long as the Hashmap is automatically expanded in size if the collection gets too large (most implementations will do this for you automatically).

So, just use the Hash map according to the given documentation, and all will be well. Don't think up anything more complicated, it will just be a waste of time.

Avoiding collisions is really impossible with a hash .. if it was possible, then it would basically just be an array or a mapping to an array, not a hash. But it isn't necessary to avoid collisions, it will still be O(1) with collisions.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文