哈希表与自平衡搜索树

发布于 2024-09-10 15:34:12 字数 349 浏览 1 评论 0原文

我很想知道使用自平衡树技术来存储项目比使用哈希表更重要的推理是什么。

我发现哈希表无法维护插入顺序,但我始终可以在顶部使用链表来存储插入顺序序列。

我发现对于少量的值,哈希函数会增加成本,但我总是可以将哈希函数与密钥一起保存以加快查找速度。

我知道哈希表比直接实现红黑树更难实现,但在实际实现中,人们不会愿意为此付出额外的努力吗?

我发现对于哈希表,发生冲突是正常的,但是使用开放寻址技术(例如允许将密钥保存在哈希表本身中的双重哈希),问题是否已减少到不予支持的效果对于这样的实现,红黑树?

我很好奇我是否严格忽略了哈希表的一个缺点,该缺点仍然使红黑树在实际应用程序(如文件系统等)中相当可行的数据结构。

I am curious to know what is the reasoning that could overweighs towards using a self-balancing tree technique to store items than using a hash table.

I see that hash tables cannot maintain the insertion-order, but I could always use a linked list on top to store the insertion-order sequence.

I see that for small number of values, there is an added cost of of the hash-function, but I could always save the hash-function together with the key for faster lookups.

I understand that hash tables are difficult to implement than the straight-forward implementation of a red-black tree, but in a practical implementation wouldn't one be willing to go an extra mile for the trouble?

I see that with hash tables it is normal for collisions to occur, but with open-addressing techniques like double hashing that allow to save the keys in the hash table itself, hasn't the problem been reduced to the effect of not tipping the favor towards red black trees for such implementations?

I am curious if I am strictly missing a disadvantage of hash table that still makes red black trees quite viable data structure in practical applications (like filesystems, etc.).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

蓝眼睛不忧郁 2024-09-17 15:34:12

这是我能想到的:

  1. 有些数据无法进行哈希处理(或者哈希成本太高),因此无法存储在哈希表中。
  2. 树按照您需要的顺序(排序)保存数据,而不是插入顺序。即使您通过哈希表运行链表,也无法(有效地)使用哈希表来做到这一点。
  3. 树有更好的最坏情况性能

Here is what I can think of:

  1. There are kinds of data which cannot be hashed (or is too expensive to hash), therefore cannot be stored in hash tables.
  2. Trees keep data in the order you need (sorted), not insertion order. You can't (effectively) do that with hash table, even if you run a linked list through it.
  3. Trees have better worst-case performace
孤独难免 2024-09-17 15:34:12

存储分配是另一个考虑因素。每次填充哈希表中的所有存储桶时,您都需要分配新的存储空间并重新哈希所有内容。如果您提前知道数据的大小,则可以避免这种情况。另一方面,平衡树根本不会遇到这个问题。

Storage allocation is another consideration. Every time you fill all of the buckets in a hash-table, you need to allocate new storage and re-hash everything. This can be avoided if you know the size of the data ahead of time. On the other hand, balanced trees don't suffer from this issue at all.

烈酒灼喉 2024-09-17 15:34:12

只是想添加:

  • 平衡二叉树具有可预测的获取数据 [log n] 的时间,与数据类型无关。很多时候,估计应用程序的响应时间对于您的应用程序可能很重要。 [哈希表可能具有不可预测的响应时间]。请记住,对于较小的 n,就像在大多数常见用例中一样,内存中查找的性能差异几乎不重要,系统的瓶颈将在其他地方,有时您只是想让系统更简单调试和分析。

  • 与哈希表相比,树通常具有更高的内存效率,并且实现起来更简单,无需对输入键的分布和可能的冲突等进行任何分析。

    与哈希

Just wanted to add :

  • Balanced binary trees have a predictable time of fetching a data [log n] independent of the type of data. Many times that may be important for your application to estimate the response times for your application. [hash tables may have unpredictable response times]. Remember for smaller n's as in most common use cases the difference in performance in an in-memory look up is hardly going to matter and the bottle neck of the system is going to be elsewhere and sometimes you just want to make the system much simpler to debug and analyze.

  • Trees are generally more memory efficient compared to hash tables and much simpler to implement without any analysis on the distribution of input keys and possible collisions etc.

緦唸λ蓇 2024-09-17 15:34:12

以我的拙见,自平衡树作为学术主题非常有效。而我
不知道任何可以被限定为“”的直接实现
红黑树”

在现实世界中,内存墙使它们的效率远低于纸上的效率。

考虑到这一点,哈希表是不错的选择,特别是如果您不练习的话
它们是学术风格(忘记表格大小限制,你会神奇地解决
表调整大小问题和几乎所有冲突问题)。

简而言之:保持简单。如果这对您来说很简单,那么对您的计算机来说也很简单。

In my humble opinion, self-balancing trees work pretty well as Academic topics. And I
do not know anything that can be qualified as a "straight-forward implementation of a
red-black tree"
.

In the real world, the memory wall makes them far less efficient than they are on paper.

With this in mind, hash tables are decent alternatives, especially if you don't practice
them the Academic style (forget about the table size constraint and you magically resolve
the table resize issue and almost all collision issues).

In a word: keep it simple. If that's simple for you then that's simple for your computer.

山川志 2024-09-17 15:34:12

我认为如果你想查询一系列键而不是一个键,自平衡树结构会比哈希表结构表现更好。

I think if you want to query for a range of keys instead of one key, self balanced tree structure will perform better than a hash table structure.

多彩岁月 2024-09-17 15:34:12

我能想到的几个原因:

  1. 树是动态的(空间复杂度为 N),而哈希表通常被实现为固定大小的数组,这意味着它们通常会用 K 大小来初始化,其中 K > 。 N,所以即使哈希图中只有 1 个元素,仍然可能有 100 个空槽占用内存。这样做的另一个影响是:

  2. 增加基于数组的哈希表的大小成本高昂(平均时间为 O(N),最坏情况为 O(N log N)),而树可以在恒定时间内增长(O(1 )) + (定位插入点的时间 (O(log N))

  3. 树中的元素可以按排序顺序收集(使用例如:中序遍历)。因此,您可以通常会得到一个排序列表作为树的免费福利。与
  4. 哈希图相比,树可以具有更好的最坏情况性能,具体取决于哈希图的实现方式(例如:具有链接的哈希图将具有 O(N) 最坏情况,而自平衡的哈希图则具有 O(N) 最坏情况。树可以保证所有操作的最坏情况为 O(log N),

自平衡树和哈希图在最佳最坏情况下的最坏情况效率均为 O(log N)(假设哈希图确实处理碰撞),但哈希图可以具有更好的平均情况性能(通常接近 O(1)),而树将具有恒定的 O(log N)。这是因为即使 hashmap 可以在 O(1) 中定位插入索引,它也必须考虑哈希冲突(多个元素哈希到同一数组索引),因此在最好的情况下会降级为自平衡树(例如hashmap的Java实现),即hashmap中的每个元素都可以实现为自平衡树,存储所有已散列到给定数组单元格的元素。

A few reasons I can think of:

  1. Trees are dynamic (the space complexity is N), whereas hash tables are often implemented as arrays which are fixed size, which means they will often be initialized with K size, where K > N, so even if you only have 1 element in a hashmap, you might still have 100 empty slots that take up memory. Another effect of this is:

  2. Increasing the size of an array-based hash table is costly (O(N) average time, O(N log N) worst case), whereas trees can grow in constant time (O(1)) + (time to locate insertion point (O(log N))

  3. Elements in a tree can be gathered in sorted order (using ex: in-order-traversal). Thereby you often get a sorted list as a free perk with trees.
  4. Trees can have a better worst-case performance vs a hashmap depending on how the hashmap is implemented (ex: hashmap with chaining will have O(N) worst case, whereas self-balanced trees can guarantee O(log N) worst case for all operations).

Both self-balanced trees and hashmaps have a worst-case efficiency of O(log N) in the best worst-case (assuming that the hashmap does handle colissions), but Hashmaps can have a better average-case performance (often close to O(1)), whereas Trees will have a constant O(log N). This is because even thou a hashmap can locate the insertion index in O(1), it has to account for hash colissions (more than one element hashing to the same array index), and thus in the best case degrades to a self-balanced tree (such as the Java implementation of hashmap), that is, each element in the hashmap can be implemented as a self-balanced tree, storing all elements which has hashed to the given array cell.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文