在什么情况下我应该使用 HashSet 而不是 TreeSet?
我一直很喜欢树,那美丽的 O(n*log(n)) 和它们的整洁。然而,我认识的每一位软件工程师都尖锐地问我为什么要使用 TreeSet
。从 CS 背景来看,我认为你使用什么并不重要,而且我不喜欢乱搞哈希函数和存储桶(就 Java 而言)。
在什么情况下我应该使用 HashSet
而不是 TreeSet
?
I've always loved trees, that nice O(n*log(n)) and the tidiness of them. However, every software engineer I've ever known has asked me pointedly why I would use a TreeSet
. From a CS background, I don't think it matters all that much which you use, and I don't care to mess around with hash functions and buckets (in the case of Java).
In which cases should I use a HashSet
over a TreeSet
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(14)
HashSet 比 TreeSet 快得多(大多数操作(如添加、删除和包含)是常数时间与日志时间),但不提供像 TreeSet 那样的排序保证。
HashSet
TreeSet
SortedSet
)first()
、last()
、headSet()
和tailSet()
等要点:
HashSet
和TreeSet
之间。然而,它被实现为一个带有链表的哈希表,它提供了插入顺序迭代,这与 TreeSet 保证的排序遍历不同。因此,使用方式的选择完全取决于您的需求,但我认为即使您需要有序集合,您仍然应该更喜欢使用 HashSet 创建 Set,然后将其转换为 TreeSet。
SortedSet; s = new TreeSet(hashSet);
HashSet is much faster than TreeSet (constant-time versus log-time for most operations like add, remove and contains) but offers no ordering guarantees like TreeSet.
HashSet
TreeSet
SortedSet
)first()
,last()
,headSet()
, andtailSet()
etcImportant points:
HashSet
andTreeSet
. Implemented as a hash table with a linked list running through it, however,it provides insertion-ordered iteration which is not same as sorted traversal guaranteed by TreeSet.So a choice of usage depends entirely on your needs but I feel that even if you need an ordered collection then you should still prefer HashSet to create the Set and then convert it into TreeSet.
SortedSet<String> s = new TreeSet<String>(hashSet);
TreeSet 尚未提及的一个优点是它具有更大的“局部性”,这是 (1) 如果两个条目在顺序上相邻的情况下的简写,则为 TreeSet将它们在数据结构中彼此靠近,因此在内存中; (2)这种放置利用了局部性原则,即相似的数据经常被具有相似频率的应用程序访问。
这与
HashSet
形成鲜明对比,后者将条目分布在整个内存中,无论它们的键是什么。当从硬盘读取的延迟成本是从缓存或 RAM 读取的成本的数千倍时,并且当数据确实通过局部性访问时,TreeSet 可能是更好的选择。
One advantage not yet mentioned of a
TreeSet
is that its has greater "locality", which is shorthand for saying (1) if two entries are nearby in the order, aTreeSet
places them near each other in the data structure, and hence in memory; and (2) this placement takes advantage of the principle of locality, which says that similar data is often accessed by an application with similar frequency.This is in contrast to a
HashSet
, which spreads the entries all over memory, no matter what their keys are.When the latency cost of reading from a hard drive is thousands of times the cost of reading from cache or RAM, and when the data really is accessed with locality, the
TreeSet
can be a much better choice.基于 @shevchyk 在地图上可爱的视觉答案,这是我的看法:
Basing on lovely visual answer on Maps by @shevchyk here is my take:
HashSet
访问元素的时间复杂度为 O(1),因此它确实很重要。但维持集合中对象的顺序是不可能的。如果维护顺序(就值而不是插入顺序而言)对您很重要,则
TreeSet
非常有用。但是,正如您所指出的,您正在以较慢的时间访问元素的顺序进行交易:基本操作的 O(log n) 。来自
TreeSet
的javadocs< /a>:HashSet
is O(1) to access elements, so it certainly does matter. But maintaining order of the objects in the set isn't possible.TreeSet
is useful if maintaining an order(In terms of values and not the insertion order) matters to you. But, as you've noted, you're trading order for slower time to access an element: O(log n) for basic operations.From the javadocs for
TreeSet
:1.HashSet允许空对象。
2.TreeSet不允许有null对象。如果尝试添加 null 值,它将抛出 NullPointerException。
3.HashSet比TreeSet快得多。
例如
1.HashSet allows null object.
2.TreeSet will not allow null object. If you try to add null value it will throw a NullPointerException.
3.HashSet is much faster than TreeSet.
e.g.
大多数使用
HashSet
的原因是操作(平均)为 O(1) 而不是 O(log n)。如果该集合包含标准项目,您将不会“乱用哈希函数”,因为这已经为您完成了。如果集合包含自定义类,则必须实现hashCode
才能使用HashSet
(尽管 Effect Java 展示了如何实现),但如果您使用TreeSet
您必须使其Comparable
或提供一个Comparator
。如果类没有特定的顺序,这可能会出现问题。我有时会使用
TreeSet
(或实际上TreeMap
)来处理非常小的集合/地图(<10 个项目),尽管我还没有检查是否有任何真正的增益这样做。对于大型集合,差异可能相当大。现在,如果您需要排序,那么
TreeSet
是合适的,尽管即使如此,如果更新频繁并且对排序结果的需求不频繁,有时将内容复制到列表或数组并对它们进行排序也可以快一点。The reason why most use
HashSet
is that the operations are (on average) O(1) instead of O(log n). If the set contains standard items you will not be "messing around with hash functions" as that has been done for you. If the set contains custom classes, you have to implementhashCode
to useHashSet
(although Effective Java shows how), but if you use aTreeSet
you have to make itComparable
or supply aComparator
. This can be a problem if the class does not have a particular order.I have sometimes used
TreeSet
(or actuallyTreeMap
) for very small sets/maps (< 10 items) although I have not checked to see if there is any real gain in doing so. For large sets the difference can be considerable.Now if you need the sorted, then
TreeSet
is appropriate, although even then if updates are frequent and the need for a sorted result is infrequent, sometimes copying the contents to a list or an array and sorting them can be faster.如果您没有插入足够的元素而导致频繁的重新散列(或冲突,如果您的 HashSet 无法调整大小),那么 HashSet 肯定会给您带来恒定时间访问的好处。但在有大量增长或收缩的集合上,您实际上可能会使用 Treesets 获得更好的性能,具体取决于实现。
如果我没记错的话,使用功能性红黑树,摊销时间可以接近 O(1)。冈崎的书会有比我更好的解释。 (或者参见他的出版物列表)
If you aren't inserting enough elements to result in frequent rehashings (or collisions, if your HashSet can't resize), a HashSet certainly gives you the benefit of constant time access. But on sets with lots of growth or shrinkage, you may actually get better performance with Treesets, depending on the implementation.
Amortized time can be close to O(1) with a functional red-black tree, if memory serves me. Okasaki's book would have a better explanation than I can pull off. (Or see his publication list)
当然,HashSet 实现要快得多——开销更少,因为没有排序。 http 提供了对 Java 中各种 Set 实现的详细分析: //java.sun.com/docs/books/tutorial/collections/implementations/set.html。
那里的讨论还指出了树与哈希问题的一个有趣的“中间立场”方法。 Java提供了LinkedHashSet,它是一个“面向插入”的链表贯穿其中的HashSet,即链表中的最后一个元素也是最近插入到Hash中的。这使您可以避免无序哈希的不规则性,而不会增加 TreeSet 的成本。
HashSet implementations are, of course, much much faster -- less overhead because there's no ordering. A good analysis of the various Set implementations in Java is provided at http://java.sun.com/docs/books/tutorial/collections/implementations/set.html.
The discussion there also points out an interesting 'middle ground' approach to the Tree vs Hash question. Java provides a LinkedHashSet, which is a HashSet with an "insertion-oriented" linked list running through it, that is, the last element in the linked list is also the most recently inserted into the Hash. This allows you to avoid the unruliness of an unordered hash without incurring the increased cost of a TreeSet.
TreeSet 是两个排序集合之一(另一个是
树图)。它使用红黑树结构(但你知道),并保证
元素将按照自然顺序升序排列。可选地,
您可以使用构造函数构造一个 TreeSet,该构造函数允许您为该集合提供您的
自己的顺序规则(而不是依赖于定义的顺序)
通过使用 Comparable 或 Comparator
,并且 LinkedHashSet 是 HashSet 的有序版本,
维护一个跨所有元素的双向链接列表。使用此类代替 HashSet
当您关心迭代顺序时。当你迭代 HashSet 时
顺序是不可预测的,而 LinkedHashSet 允许您迭代元素
按照它们插入的顺序
The TreeSet is one of two sorted collections (the other being
TreeMap). It uses a Red-Black tree structure (but you knew that), and guarantees
that the elements will be in ascending order, according to natural order. Optionally,
you can construct a TreeSet with a constructor that lets you give the collection your
own rules for what the order should be (rather than relying on the ordering defined
by the elements' class) by using a Comparable or Comparator
and A LinkedHashSet is an ordered version of HashSet that
maintains a doubly-linked List across all elements. Use this class instead of HashSet
when you care about the iteration order. When you iterate through a HashSet the
order is unpredictable, while a LinkedHashSet lets you iterate through the elements
in the order in which they were inserted
既然可以吃橙子,为什么还要吃苹果呢?
说真的,伙计们,如果您的集合很大,读取和写入了无数次,并且您为 CPU 周期付费,那么只有当您需要它更好地执行时,集合的选择才有意义。然而,在大多数情况下,这并不重要——这里或那里的几毫秒对于人类来说是不会被注意到的。如果它真的那么重要,为什么不用汇编语言或 C 语言编写代码呢? [提示另一场讨论]。因此,关键是,如果您很高兴使用您选择的任何集合,并且它解决了您的问题(即使它不是该任务的最佳集合类型),那么您就会崩溃。该软件具有可塑性。必要时优化您的代码。鲍勃叔叔说过早的优化是万恶之源。 鲍勃叔叔这么说
Why have apples when you can have oranges?
Seriously guys and gals - if your collection is large, read and written to gazillions of times, and you're paying for CPU cycles, then the choice of the collection is relevant ONLY if you NEED it to perform better. However, in most cases, this doesn't really matter - a few milliseconds here and there go unnoticed in human terms. If it really mattered that much, why aren't you writing code in assembler or C? [cue another discussion]. So the point is if you're happy using whatever collection you chose, and it solves your problem [even if it's not specifically the best type of collection for the task] knock yourself out. The software is malleable. Optimise your code where necessary. Uncle Bob says Premature Optimisation is the root of all evil. Uncle Bob says so
即使 11 年后,仍然没有人想到提及一个非常重要的差异。
您是否认为如果
HashSet
等于TreeSet
那么反之亦然?看一下这段代码:尝试猜测输出,然后将鼠标悬停在代码片段下方以查看真正的输出是什么。准备好?干得好:
是的,对于与 equals 不一致的比较器,它们不保持等价关系。其原因是
TreeSet
使用比较器来确定等效性,而HashSet
使用equals
。它们在内部使用HashMap
和TreeMap
,因此您也应该预期上述Map
也会出现这种行为。最初回答
Even after 11 years, nobody thought of mentioning a very important difference.
Do you think that if
HashSet
equalsTreeSet
then the opposite is true as well? Take a look at this code:Try to guess the output and then hover below snippet for seeing what the real output is. Ready? Here you go:
That's right, they don't hold equivalence relation for a comparator that is inconsistent with equals. The reason for this is that a
TreeSet
uses a comparator to determine the equivalence whileHashSet
usesequals
. Internally they useHashMap
andTreeMap
so you should expect this behavior with the mentionedMap
s as well.Originally answered
消息编辑(完全重写)当顺序不重要时,就是这样。两者都应该给出 Log(n) - 看看其中一个是否比另一个快 5% 以上会很有用。 HashSet 可以在循环中进行 O(1) 测试,应该可以揭示是否如此。
Message Edit ( complete rewrite ) When order does not matter, that's when. Both should give Log(n) - it would be of utility to see if either is over five percent faster than the other. HashSet can give O(1) testing in a loop should reveal whether it is.
基于技术考虑,尤其是性能方面,已经给出了很多答案。
据我所知,TreeSet 和 HashSet 之间的选择很重要。
但我宁愿说选择应该首先由概念考虑因素驱动。
如果对于需要操作的对象,自然排序没有意义,则不要使用 TreeSet。
它是一个排序集,因为它实现了
SortedSet
。因此,这意味着您需要重写函数compareTo
,它应该与返回函数equals
的内容一致。例如,如果您有一组名为 Student 的类的对象,那么我认为TreeSet
没有意义,因为学生之间没有自然顺序。你可以按照他们的平均成绩来排序,好吧,但这不是“自然排序”。不仅当两个对象代表同一个学生时,而且当两个不同的学生具有相同的成绩时,函数compareTo
也会返回 0。对于第二种情况,equals
将返回 false(除非您决定让后者在两个不同的学生具有相同成绩时返回 true,这将使equals
函数具有误导性意思,不是说错误的意思。)请注意,
equals
和compareTo
之间的一致性是可选的,但强烈建议这样做。否则接口Set
的契约被破坏,使您的代码误导其他人,从而也可能导致意外的行为。这个链接可能是一个很好的来源有关此问题的信息。
A lot of answers have been given, based on technical considerations, especially around performance.
According to me, choice between
TreeSet
andHashSet
matters.But I would rather say the choice should be driven by conceptual considerations first.
If, for the objects your need to manipulate, a natural ordering does not make sense, then do not use
TreeSet
.It is a sorted set, since it implements
SortedSet
. So it means you need to override functioncompareTo
, which should be consistent with what returns functionequals
. For example if you have a set of objects of a class called Student, then I do not think aTreeSet
would make sense, since there is no natural ordering between students. You can order them by their average grade, okay, but this is not a "natural ordering". FunctioncompareTo
would return 0 not only when two objects represent the same student, but also when two different students have the same grade. For the second case,equals
would return false (unless you decide to make the latter return true when two different students have the same grade, which would makeequals
function have a misleading meaning, not to say a wrong meaning.)Please note this consistency between
equals
andcompareTo
is optional, but strongly recommended. Otherwise the contract of interfaceSet
is broken, making your code misleading to other people, thus also possibly leading to unexpected behavior.This link might be a good source of information regarding this question.