哈希表和列表并排?
我需要一个有序的数据结构,但也能提供快速的随机访问以及插入和删除。链表是有序的,插入和删除速度很快,但随机访问速度很慢。哈希表提供快速随机访问,但不是有序的。
因此,将它们一起使用似乎很不错。在我当前的解决方案中,我的哈希表包含列表的迭代器,列表包含实际的项目。又好又有效。好吧,它需要双倍的内存,但这不是问题。
我听说某些树结构也可以做到这一点,但它们和这个解决方案一样快吗?
I need a data structure that is ordered but also gives fast random access and inserts and removes. Linkedlists are ordered and fast in inserts and removes but they give slow random access. Hashtables give fast random access but are not ordered.
So, it seems to nice to use both of them together. In my current solution, my Hashtable includes iterators of the list and the List contains the actual items. Nice and effective. Okay, it requires double the memory but that's not an issue.
I have heard that some tree structures could do this also, but are they as fast as this solution?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我知道的最有效的树结构是 红黑树,它的速度不如你的解决方案,因为它对于所有操作都有 O(log n) 复杂度,而您的解决方案对于某些(如果不是全部)操作具有 O(1) 复杂度。
如果内存不是问题,并且您确定您的解决方案是 O(1),这意味着在结构中添加/删除/查找项目所需的时间与您拥有的项目数量无关,那就去吧。
The most efficient tree structure I know is Red Black Tree, and it's not as fast as your solution as it has O(log n) for all operations while your solution has O(1) for some, if not all, operations.
If memory is not an issue and you sure your solution is O(1) meaning the time required to add/delete/find item in the structure is not related to the amount of items you have, go for it.
您应该考虑一个 Skip List,它是一个具有 O(log n) 访问权限的有序链表次。换句话说,你可以枚举它 O(n),索引/插入/删除是 O(log n)。
You should consider a Skip List, which is an ordered linked-list with O(log n) access times. In other words, you can enumerate it O(n) and index/insert/delete is O(log n).
树木就是为此而生的。最合适的是自平衡树,例如 AVL 树 或 >红黑树。如果您处理非常大的数据量,创建 B 树也可能很有用(例如,它们用于文件系统)。
关于您的实现:它的效率可能比树更高或更低,具体取决于您使用的数据量和哈希表实现。例如,某些具有非常密集数据的哈希表可能不会以 O(1) 的速度提供访问,而是以 O(log n) 甚至 O(n) 的速度进行访问。另请记住,从数据计算哈希值也需要一些时间,因此对于少量数据,计算哈希值的绝对时间可能比在树中搜索它的时间更多。
Trees are made for this. The most appropriate are self-balancing trees like AVL tree or Red Black tree. If you deal with a very big data amounts, it also may be useful to create B-tree (they are used for filesystems, for example).
Concerning your implementation: it may be more or less efficient then trees depending on data amount you work with and HashTable implementation. E.g. some hash tables with a very dense data may give access not in O(1) but in O(log n) or even O(n). Also remember that computing hash from data takes some time too, so for a quit small data amounts absolute time for computing hash may be more then for searching it in a tree.
你所做的几乎是正确的选择。
最酷的一点是,通过使用双端双链表向现有映射实现添加排序实际上并不会改变其渐近复杂性,因为所有相关的列表操作(追加和删除)都有一个最坏情况的步骤θ(1) 的复杂度。 (是的,删除也是 θ(1)。通常是 θ(n) 是因为你必须先找到要删除的元素,即 θ(n),但实际的删除本身是 θ(1) 在这种特殊情况下,您让映射来进行查找,这类似于 θ(1) 摊销最坏情况步骤复杂度或 θ(logb)。 n) 最坏情况的步骤复杂度,取决于所使用的映射实现的类型。)
例如,Ruby 1.9 中的
Hash
类是一个有序映射,并且至少实现了在 YARV 和 Rubinius 中,作为嵌入到链表中的哈希表。对于随机访问,树的最坏情况步骤复杂度通常为 θ(logb n),而哈希表在最坏情况下可能更差 (θ(n)),但通常摊销为 θ( 1),前提是你没有搞砸哈希函数或调整大小函数。
[注意:我故意在这里只谈论渐近行为,又名“无限大”集合。如果您的收藏量很小,那么只需选择常数因子最小的一个即可。]
What you did is pretty much the right choice.
The cool thing about this is that adding ordering to an existing map implementation by using a double-ended doubly-linked list doesn't actually change its asymptotic complexity, because all the relevant list operations (appending and deleting) have a worst-case step complexity of Θ(1). (Yes, deletion is Θ(1), too. The reason it is usually Θ(n) is because you have to find the element to delete first, which is Θ(n), but the actual deletion itself is Θ(1). In this particular case, you let the map do the finding, which is something like Θ(1) amortized worst-case step complexity or Θ(logb n) worst-case step complexity, depending on the type of map implementation used.)
The
Hash
class in Ruby 1.9, for example, is an ordered map, and it is implemented at least in YARV and Rubinius as a hash table embedded into a linked list.Trees generally have a worst-case step complexity of Θ(logb n) for random access, whereas hash tables may be worse in the worst case (Θ(n)), but usually amortize to Θ(1), provided you don't screw up the hash function or the resize function.
[Note: I'm deliberately only talking about asymptotic behavior here, aka "infinitely large" collections. If your collections are small, then just choose the one with the smallest constant factors.]
Java实际上包含一个LinkedHashTable,其中与您描述的数据结构类似。有时它会非常有用。
树结构也可以工作,因为它们可以在 (O log n) 时间内执行随机访问(以及大多数其他操作)。不如 Hashtables (O 1) 快,但仍然很快,除非您的数据库非常大。
树的唯一真正优点是您不需要事先决定容量。一些 HashTable 实现可以根据需要增加其容量,但只需在超出其容量时将所有项目复制到一个新的更大的哈希表中即可,这是非常慢的。 (在)
Java actually contains a LinkedHashTable, which is similar to the data-structure you're describing. It can be surprisingly useful at times.
Tree structures could work as well, seeing they can perform random access (and most other operations) in (O log n) time. Not as fast as Hashtables (O 1), but still fast unless your database is very large.
The only real advantage of trees is that you don't need to decide on the capacity beforehand. Some HashTable implementations can grow their capacity as needed, but simply do so by copying all items into a new, larger hashtable when they've exceeded their capacity, which is very slow. (O n)