加快典型的“查找”速度通过与冒泡排序配对来实现链表?
我正在制作一种类似 forth 的语言,它应该在内存非常宝贵的地方运行。
由于空间限制和缺乏可调整大小的内存,该语言使用链接列表作为语言单词的字典。
然而,一想到链表查找的性能成本,我就很伤心。我知道最坏的情况总是 O(n),但是当我意识到一些事情时,我试图想出至少改进典型情况的方法:如果“查找”方法除了查找密钥之外,还可以怎么办?每次按下按键时都会执行一次类似于冒泡排序的操作。这样,最常用的键就会“冒泡”到顶部。更好的是,随着编译的继续,整个列表将被重新加权,并且应该大致与键的连续统计可能性相关。
这个技术在其他地方有使用过吗?我很好奇是否有一个很好的数学演示来证明它的运行时复杂性(假设一些名称与其他名称的统计曲线)。单个冒泡排序操作显然是 O(1),因此至少不会损害理论上的运行时复杂性。
I'm making a forth-like language that is supposed to run in places memory is a premium.
The language uses a linked list for it's dictionary of language words due to space constraints and lack of resizeable memory.
However, it hurts me to think of the performance costs of the linked list lookup. I understand that the worst case will always be O(n), but I was trying to think of ways to at least improve the typical case when I realized something: what if the "find" method, in addition to finding a key, also performs a single bubble-sort-like operation each time a key is hit. This way the most common keys will "bubble" to the top. Even better, the entire list will be re-weighted as compilation continues and should roughly correlate to a key's continuous statistical likelihood.
Has this technique been used in other places? I'm curious if there is a decent mathematical demonstration of it's runtime complexity (assuming a statistical curve of some names vs others). A single bubble sort operation is clearly O(1) so at least it can't hurt the theoretical runtime complexity.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
虽然此策略应该可以提高常见情况下的平均运行时间,但这并不会改变最坏情况下的复杂度,即
O(n)
。事实上,如果搜索到的键位于
n
大小的列表末尾,则查找将在O(n)
时间内运行。气泡交换操作在O(1)
时间内运行(假设密钥实际上可以在恒定时间内进行比较)。如果获取相同的键,则下一个查找操作会快一点,但仍然是O(n)
。在n
次获取相同密钥后,获取该密钥可以在O(1)
时间内完成。但是,以特定顺序获取 n 个其他键可以对列表重新排序,以便将初始键放在末尾。更具体地说,获取初始键旁边的项目n
次就可以做到这一点。最后,按特定顺序获取n
次初始键和n
个其他键会得到(n + n-1 + n-2 + .. . + 2 + 1) + (1 + 2 + ... + n-1 + n) = n*(n-1) = n²-n = O(n²)
运算。在这些操作之后,列表应该处于与初始状态相同的状态。因此,复杂度显然是O(n)
。请注意,您可以实现缓存来加快获取速度。存在许多政策。您可以在此处找到其中的一些描述。请注意,这不会影响复杂性,但肯定会大大缩短执行时间。缓存可以存储指向链表节点的迭代器。当插入/删除项目时,迭代器不会失效(除非目标项目实际被删除)。
另请注意,链表通常非常慢。它们在内存使用方面也不是很有效,因为指向下一项的指针占用一些空间(在 64 位体系结构上为 8 个字节)。分配的节点还可能需要一些与所使用的标准库分配器有关的隐藏空间(一些存储指针元数据,例如分配的空间)。使用更少内存的解决方案是使用包含键值对存储桶的链表。
请注意,虽然平衡二叉搜索树需要更多的内存,但它们可以更有效地解决此问题。在这种数据结构中查找键的复杂度是
O(log n)
。一个好的哈希映射实现在内存中也可以非常紧凑(参见 跳房子哈希),尽管调整哈希映射大小时,其内存消耗可能会相当大。While this strategy should improve the average run-time in common cases, this does not change the worst-case complexity which is
O(n)
.Indeed, if the searched key is in the end of the list of size
n
, a find will run inO(n)
time. The bubble-swap operation runs inO(1)
time (assuming the key can actually be compared in constant time). The next find operation is a bit faster if the same key is fetched but stillO(n)
. Aftern
fetches of the same key, fetching this key can be done inO(1)
time. However, fetchingn
other key in a specific order can reorder the list so that the initial key is put at the end. More specifically, fetching the item next to the initial keyn
times does that. In the end, fetchingn
time the initial key andn
other keys in a specific order results in(n + n-1 + n-2 + ... + 2 + 1) + (1 + 2 + ... + n-1 + n) = n*(n-1) = n²-n = O(n²)
operations. After these operations, the list should be in the same state then the initial one. Thus, the complexity is clearlyO(n)
.Note that you can implement a cache to speed up fetches. Many policies exists. You can find some of them described here. Note that this should not impact the complexity, but will certainly greatly improve the execution time. The cache can store an iterator to a node of the linked list. Iterators are not invalidated when items are inserted/deleted (unless the target item is actually deleted).
Note also that linked lists are generally very slow. They are not very efficient in term of memory usage too because the pointer to the next item take some space (8 bytes on a 64-bit architecture). Allocated nodes can also require some hidden space regarding the standard library allocator used (some store pointer metadata like the allocated space). A solution to use less memory is to use linked list containing buckets of key-value pairs.
Note that while balanced binary search trees require a bit more memory, they can be much more efficient to solve this problem. The complexity of finding a key in such data structure is
O(log n)
. A good hash-map implementation can be quite compact in memory (see hopscotch hashing) too although the memory consumption of the hash-map can be quite big when it is resized.