具有极快插入时间的结构
我正在寻找一种允许非常快速插入的有序数据结构。这是唯一需要的属性。数据只能从顶部元素访问和删除。
更准确地说,我需要 2 个结构:
1)第一个结构应该允许使用 int 值进行有序插入。完成插入后,它应报告插入元素的排名。
2) 第二个结构应该允许在指定的等级插入。
要存储的元素数量可能是数千或数万。
[编辑] 我必须修改体积假设:即使在任何时刻,有序结构的大小可能在数万范围内,插入的总数也可能在数千万个范围内跑步。
O(1) 内的插入时间会很好,尽管 O(log(log(n))) 也很容易接受。目前,我仅对第一个结构有一些有趣的候选,但要么在 log(n) 中,要么无法报告插入排名(这是强制性的)。
I'm looking for an ordered data structure which allows very fast insertion. That's the only property required. Data will only be accessed and deleted from the top element.
To be more precised, i need 2 structures :
1) The first structure should allow an ordered insertion using an int value. On completing the insertion, it shall report the rank of the inserted element.
2) The second structure should allow insertion at a specified rank.
The number of elements to be stored is likely to be in thousands, or tens of thousands.
[edit] i must amend the volume hypothesis : even though, at any moment, the size of the ordered structure is likely to be in the range of tens of thousands, the total number of insertion is likely to be in the tens of millions per run.
Insertion time in O(1) would be nice, although O(log(log(n))) is very acceptable too. Currently i've got some interesting candidate for First structure only, but either in log(n), or without the capability to report insertion rank (which is mandatory).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
skip-list 的形式怎么样,特别是链接文章中的“索引跳过列表” 。这应该为您的两个用例提供 O(lg N) 插入和查找,以及 O(1) 对第一个节点的访问。
--编辑--
当我想到 O(1) 算法时,我想到的是基于基数的方法。这是一个 O(1) 插入并返回了排名。这个想法是将密钥分解为半字节,并记录所有具有该前缀的插入项的计数。不幸的是,常量很高(<=64 次取消引用和添加),并且存储空间为 O(2 x 2^INT_BITS),这很糟糕。这是 16 位整数的版本,扩展到 32 位应该很简单。
该结构还支持O(1) GetMin 和RemoveMin。 (GetMin 是即时的,Remove 有一个类似于 Insert 的常量。)
如果数据稀疏且分布良好,则可以删除
p4
计数器,而是在 P3 级别进行插入排序。这会将存储成本降低 16,但代价是当存在许多相似值时,最坏情况插入会更高。改进存储的另一个想法是将这个想法与可扩展哈希之类的东西结合起来。使用整数键作为哈希值,并记录目录中插入的节点的数量。对插入中的相关字典条目进行求和(如上所述)应该仍然是 O(1) 且具有较大的常数,但存储空间将减少到 O(N)
What about a form of skip-list, specifically the " indexed skiplist" in the linked article. That should give O(lg N) insert and lookup, and O(1) access to the first node for both your use cases.
--Edit--
When I think of O(1) algorithms, I think of radix-based methods. Here is an O(1) insert with rank returned. The idea is to break the key up into nibbles, and keep count of all the inserted items which have that prefix. Unfortunately, the the constant is high (<=64 dereferences and additions), and the storage is O(2 x 2^INT_BITS), which is awful. This is the version for 16 bit ints, expanding to 32 bits should be straightforward.
This structure also supports O(1) GetMin and RemoveMin. (GetMin is instant, Remove has a constant similar to Insert.)
If your data is sparse and well distributed, you could remove the
p4
counter, and instead do an insertion sort into the P3 level. That would reduce storage costs by 16, at the cost of a higher worst case insert when there are many similar values.Another idea to improve the storage would be to do combine this idea with something like an Extendable Hash. Use the integer key as the hash value, and keep count of the inserted nodes in the directory. Doing a sum over the relevant dictionary entries on an insert (as above) should still be O(1) with a large constant, but the storage would reduce to O(N)
订单统计树似乎可以满足您在 O(LogN) 时间内的需求。 链接
如果你只有几万个元素,O(LogN) 时间和 O(1) 时间渐近时间复杂度之间的性能差异并不像你想象的那么显着。例如,考虑 100000 个元素,logN 方法仅慢 16 倍。
在这种情况下,系数(实现、开销)的差异可能是优化的真正目标。由于继承的复杂性,花哨的数据结构通常具有更高的开销(有时慢数千倍)。它们更有可能来自不太精细的实现,因为它们很少被使用。
您应该对不同的堆实现进行基准测试(实际测试),以找到具有最佳实际性能的堆实现。
Order Statistic Tree seems to fit your need at O(LogN) time. Link
If you only have tens of thousands of elements, the performance difference between O(LogN) time and O(1) time asymptotic time complexity is not as significant as you thought. For example, consider 100000 elements, the logN method is only 16 times slower.
In this case the difference in coefficient (implementation, overheads) may be the real target of optimization. Fancy data structures usually have a much higher overhead due to the inherit complexity (sometimes thousands of times slower). They are more likely to come from less refined implementation because they are less used.
You should benchmark(actually test) the different heap implementations to find one with the best real performance.
你说你需要一个有序的数据结构,在我看来,你需要一些可以在 O(n) 时间内生成所有元素的东西。
但随后你说你只会访问顶部(最少?)元素,这表明你实际上只需要能够重复产生最小值的东西 - 打开通向具有部分排序的东西的大门。
是哪一个?
You say you need an ordered datastructure, which to me sounds like you need something that can yield all the elements contained in O(n) time.
But then you say you will only be accessing the top (least?) element, suggesting that you really just need something that can yield the minimum value, repeatedly - opening the door to something with a partial ordering.
Which is it?
如果我正确理解你的问题,我建议你使用一个字典,其键是排名,值是链接列表。
使用键,您可以拥有排名,使用链表作为值,您可以拥有 O(1) 插入时间。另外,作为删除,您可以拥有 O(1)。您可以使用链表实现堆栈或队列,这就是您想要的。
或者您可以只使用双向链表,保证插入和删除的时间复杂度为 O(1)。为了排名,您可以将该信息嵌入到节点中。
If I understand your question correctly,I would recommend you to use use a Dictionary whose keys are ranks and values are linked list.
With keys you can have ranks and with linked list as the values, you can have O(1) insertion time. Also as removal, you can have O(1). You can implement a stack or queue with linkedlist, which is what you want.
Or you can just use a doubly linked list in which you are guaranteed to have O(1) insertion and removal. for ranking, you can embed that information within the nodes.