在什么情况下我应该使用 HashSet 而不是 TreeSet?

发布于 2024-08-05 17:07:13 字数 214 浏览 8 评论 0原文

我一直很喜欢树,那美丽的 O(n*log(n)) 和它们的整洁。然而,我认识的每一位软件工程师都尖锐地问我为什么要使用 TreeSet。从 CS 背景来看,我认为你使用什么并不重要,而且我不喜欢乱搞哈希函数和存储桶(就 Java 而言)。

在什么情况下我应该使用 HashSet 而不是 TreeSet

I've always loved trees, that nice O(n*log(n)) and the tidiness of them. However, every software engineer I've ever known has asked me pointedly why I would use a TreeSet. From a CS background, I don't think it matters all that much which you use, and I don't care to mess around with hash functions and buckets (in the case of Java).

In which cases should I use a HashSet over a TreeSet?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(14

薆情海 2024-08-12 17:07:13

HashSet 比 TreeSet 快得多(大多数操作(如添加、删除和包含)是常数时间与日志时间),但不提供像 TreeSet 那样的排序保证。

HashSet

  • 该类为基本操作(添加、删除、包含和大小)提供恒定时间性能。
  • 它不保证元素的顺序随着时间的推移保持不变。
  • 迭代性能取决于 HashSet 的初始容量负载因子
    • 接受默认负载因子是相当安全的,但您可能需要指定大约是您预计集合增长到的大小两倍的初始容量。

TreeSet

  • 保证 log(n) 时间成本基本操作(添加、删除和包含)
  • 保证 set 的元素将被排序(升序、自然或您通过其构造函数指定的排序)(实现 SortedSet)
  • 不提供任何迭代性能调整参数,
  • 提供了一些方便的方法处理有序集,例如 first()last()headSet()tailSet()

要点:

  • 两者都保证元素的无重复集合
  • 通常,将元素添加到 HashSet,然后将集合转换为 TreeSet 以进行无重复排序遍历会更快。
  • 这些实现都不是同步的。也就是说,如果多个线程同时访问一个集合,并且至少有一个线程修改了该集合,则必须进行外部同步。
  • LinkedHashSet 在某种意义上介于 HashSetTreeSet 之间。然而,它被实现为一个带有链表的哈希表,它提供了插入顺序迭代,这与 TreeSet 保证的排序遍历不同

因此,使用方式的选择完全取决于您的需求,但我认为即使您需要有序集合,您仍然应该更喜欢使用 HashSet 创建 Set,然后将其转换为 TreeSet。

  • 例如 SortedSet; s = new TreeSet(hashSet);

HashSet is much faster than TreeSet (constant-time versus log-time for most operations like add, remove and contains) but offers no ordering guarantees like TreeSet.

HashSet

  • the class offers constant time performance for the basic operations (add, remove, contains and size).
  • it does not guarantee that the order of elements will remain constant over time
  • iteration performance depends on the initial capacity and the load factor of the HashSet.
    • It's quite safe to accept default load factor but you may want to specify an initial capacity that's about twice the size to which you expect the set to grow.

TreeSet

  • guarantees log(n) time cost for the basic operations (add, remove and contains)
  • guarantees that elements of set will be sorted (ascending, natural, or the one specified by you via its constructor) (implements SortedSet)
  • doesn't offer any tuning parameters for iteration performance
  • offers a few handy methods to deal with the ordered set like first(), last(), headSet(), and tailSet() etc

Important points:

  • Both guarantee duplicate-free collection of elements
  • It is generally faster to add elements to the HashSet and then convert the collection to a TreeSet for a duplicate-free sorted traversal.
  • None of these implementations are synchronized. That is if multiple threads access a set concurrently, and at least one of the threads modifies the set, it must be synchronized externally.
  • LinkedHashSet is in some sense intermediate between HashSet and TreeSet. Implemented as a hash table with a linked list running through it, however,it provides insertion-ordered iteration which is not same as sorted traversal guaranteed by TreeSet.

So a choice of usage depends entirely on your needs but I feel that even if you need an ordered collection then you should still prefer HashSet to create the Set and then convert it into TreeSet.

  • e.g. SortedSet<String> s = new TreeSet<String>(hashSet);
塔塔猫 2024-08-12 17:07:13

TreeSet 尚未提及的一个优点是它具有更大的“局部性”,这是 (1) 如果两个条目在顺序上相邻的情况下的简写,则为 TreeSet将它们在数据结构中彼此靠近,因此在内存中; (2)这种放置利用了局部性原则,即相似的数据经常被具有相似频率的应用程序访问。

这与 HashSet 形成鲜明对比,后者将条目分布在整个内存中,无论它们的键是什么。

当从硬盘读取的延迟成本是从缓存或 RAM 读取的成本的数千倍时,并且当数据确实通过局部性访问时,TreeSet 可能是更好的选择。

One advantage not yet mentioned of a TreeSet is that its has greater "locality", which is shorthand for saying (1) if two entries are nearby in the order, a TreeSet places them near each other in the data structure, and hence in memory; and (2) this placement takes advantage of the principle of locality, which says that similar data is often accessed by an application with similar frequency.

This is in contrast to a HashSet, which spreads the entries all over memory, no matter what their keys are.

When the latency cost of reading from a hard drive is thousands of times the cost of reading from cache or RAM, and when the data really is accessed with locality, the TreeSet can be a much better choice.

却一份温柔 2024-08-12 17:07:13

基于 @shevchyk 在地图上可爱的视觉答案,这是我的看法:

╔══════════════╦═════════════════════╦═══════════════════╦═════════════════════╗
║   Property   ║       HashSet       ║      TreeSet      ║     LinkedHashSet   ║
╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣
║              ║  no guarantee order ║ sorted according  ║                     ║
║   Order      ║ will remain constant║ to the natural    ║    insertion-order  ║
║              ║      over time      ║    ordering       ║                     ║
╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣
║ Add/remove   ║        O(1)         ║     O(log(n))     ║        O(1)         ║
╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣
║              ║                     ║   NavigableSet    ║                     ║
║  Interfaces  ║         Set         ║       Set         ║         Set         ║
║              ║                     ║    SortedSet      ║                     ║
╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣
║              ║                     ║    not allowed    ║                     ║
║  Null values ║       allowed       ║ 1st element only  ║      allowed        ║
║              ║                     ║     in Java 7     ║                     ║
╠══════════════╬═════════════════════╩═══════════════════╩═════════════════════╣
║              ║   Fail-fast behavior of an iterator cannot be guaranteed      ║
║   Fail-fast  ║ impossible to make any hard guarantees in the presence of     ║
║   behavior   ║           unsynchronized concurrent modification              ║
╠══════════════╬═══════════════════════════════════════════════════════════════╣
║      Is      ║                                                               ║
║ synchronized ║              implementation is not synchronized               ║
╚══════════════╩═══════════════════════════════════════════════════════════════╝

Basing on lovely visual answer on Maps by @shevchyk here is my take:

╔══════════════╦═════════════════════╦═══════════════════╦═════════════════════╗
║   Property   ║       HashSet       ║      TreeSet      ║     LinkedHashSet   ║
╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣
║              ║  no guarantee order ║ sorted according  ║                     ║
║   Order      ║ will remain constant║ to the natural    ║    insertion-order  ║
║              ║      over time      ║    ordering       ║                     ║
╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣
║ Add/remove   ║        O(1)         ║     O(log(n))     ║        O(1)         ║
╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣
║              ║                     ║   NavigableSet    ║                     ║
║  Interfaces  ║         Set         ║       Set         ║         Set         ║
║              ║                     ║    SortedSet      ║                     ║
╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣
║              ║                     ║    not allowed    ║                     ║
║  Null values ║       allowed       ║ 1st element only  ║      allowed        ║
║              ║                     ║     in Java 7     ║                     ║
╠══════════════╬═════════════════════╩═══════════════════╩═════════════════════╣
║              ║   Fail-fast behavior of an iterator cannot be guaranteed      ║
║   Fail-fast  ║ impossible to make any hard guarantees in the presence of     ║
║   behavior   ║           unsynchronized concurrent modification              ║
╠══════════════╬═══════════════════════════════════════════════════════════════╣
║      Is      ║                                                               ║
║ synchronized ║              implementation is not synchronized               ║
╚══════════════╩═══════════════════════════════════════════════════════════════╝
久隐师 2024-08-12 17:07:13

HashSet 访问元素的时间复杂度为 O(1),因此它确实很重要。但维持集合中对象的顺序是不可能的。

如果维护顺序(就值而不是插入顺序而言)对您很重要,则 TreeSet 非常有用。但是,正如您所指出的,您正在以较慢的时间访问元素的顺序进行交易:基本操作的 O(log n) 。

来自TreeSet的javadocs< /a>:

此实现为基本操作(addremovecontains)提供有保证的 log(n) 时间成本。

HashSet is O(1) to access elements, so it certainly does matter. But maintaining order of the objects in the set isn't possible.

TreeSet is useful if maintaining an order(In terms of values and not the insertion order) matters to you. But, as you've noted, you're trading order for slower time to access an element: O(log n) for basic operations.

From the javadocs for TreeSet:

This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).

£冰雨忧蓝° 2024-08-12 17:07:13

1.HashSet允许空对象。

2.TreeSet不允许有null对象。如果尝试添加 null 值,它将抛出 NullPointerException。

3.HashSet比TreeSet快得多。

例如

 TreeSet<String> ts = new TreeSet<String>();
 ts.add(null); // throws NullPointerException

 HashSet<String> hs = new HashSet<String>();
 hs.add(null); // runs fine

1.HashSet allows null object.

2.TreeSet will not allow null object. If you try to add null value it will throw a NullPointerException.

3.HashSet is much faster than TreeSet.

e.g.

 TreeSet<String> ts = new TreeSet<String>();
 ts.add(null); // throws NullPointerException

 HashSet<String> hs = new HashSet<String>();
 hs.add(null); // runs fine
咋地 2024-08-12 17:07:13

大多数使用 HashSet 的原因是操作(平均)为 O(1) 而不是 O(log n)。如果该集合包含标准项目,您将不会“乱用哈希函数”,因为这已经为您完成了。如果集合包含自定义类,则必须实现 hashCode 才能使用 HashSet(尽管 Effect Java 展示了如何实现),但如果您使用 TreeSet您必须使其Comparable 或提供一个Comparator。如果类没有特定的顺序,这可能会出现问题。

我有时会使用 TreeSet (或实际上 TreeMap)来处理非常小的集合/地图(<10 个项目),尽管我还没有检查是否有任何真正的增益这样做。对于大型集合,差异可能相当大。

现在,如果您需要排序,那么 TreeSet 是合适的,尽管即使如此,如果更新频繁并且对排序结果的需求不频繁,有时将内容复制到列表或数组并对它们进行排序也可以快一点。

The reason why most use HashSet is that the operations are (on average) O(1) instead of O(log n). If the set contains standard items you will not be "messing around with hash functions" as that has been done for you. If the set contains custom classes, you have to implement hashCode to use HashSet (although Effective Java shows how), but if you use a TreeSet you have to make it Comparable or supply a Comparator. This can be a problem if the class does not have a particular order.

I have sometimes used TreeSet (or actually TreeMap) for very small sets/maps (< 10 items) although I have not checked to see if there is any real gain in doing so. For large sets the difference can be considerable.

Now if you need the sorted, then TreeSet is appropriate, although even then if updates are frequent and the need for a sorted result is infrequent, sometimes copying the contents to a list or an array and sorting them can be faster.

金橙橙 2024-08-12 17:07:13

如果您没有插入足够的元素而导致频繁的重新散列(或冲突,如果您的 HashSet 无法调整大小),那么 HashSet 肯定会给您带来恒定时间访问的好处。但在有大量增长或收缩的集合上,您实际上可能会使用 Treesets 获得更好的性能,具体取决于实现。

如果我没记错的话,使用功能性红黑树,摊销时间可以接近 O(1)。冈崎的书会有比我更好的解释。 (或者参见他的出版物列表

If you aren't inserting enough elements to result in frequent rehashings (or collisions, if your HashSet can't resize), a HashSet certainly gives you the benefit of constant time access. But on sets with lots of growth or shrinkage, you may actually get better performance with Treesets, depending on the implementation.

Amortized time can be close to O(1) with a functional red-black tree, if memory serves me. Okasaki's book would have a better explanation than I can pull off. (Or see his publication list)

哎呦我呸! 2024-08-12 17:07:13

当然,HashSet 实现要快得多——开销更少,因为没有排序。 http 提供了对 Java 中各种 Set 实现的详细分析: //java.sun.com/docs/books/tutorial/collections/implementations/set.html

那里的讨论还指出了树与哈希问题的一个有趣的“中间立场”方法。 Java提供了LinkedHashSet,它是一个“面向插入”的链表贯穿其中的HashSet,即链表中的最后一个元素也是最近插入到Hash中的。这使您可以避免无序哈希的不规则性,而不会增加 TreeSet 的成本。

HashSet implementations are, of course, much much faster -- less overhead because there's no ordering. A good analysis of the various Set implementations in Java is provided at http://java.sun.com/docs/books/tutorial/collections/implementations/set.html.

The discussion there also points out an interesting 'middle ground' approach to the Tree vs Hash question. Java provides a LinkedHashSet, which is a HashSet with an "insertion-oriented" linked list running through it, that is, the last element in the linked list is also the most recently inserted into the Hash. This allows you to avoid the unruliness of an unordered hash without incurring the increased cost of a TreeSet.

情泪▽动烟 2024-08-12 17:07:13

TreeSet 是两个排序集合之一(另一个是
树图)。它使用红黑树结构(但你知道),并保证
元素将按照自然顺序升序排列。可选地,
您可以使用构造函数构造一个 TreeSet,该构造函数允许您为该集合提供您的
自己的顺序规则(而不是依赖于定义的顺序)
通过使用 Comparable 或 Comparator

,并且 LinkedHashSet 是 HashSet 的有序版本,
维护一个跨所有元素的双向链接列表。使用此类代替 HashSet
当您关心迭代顺序时。当你迭代 HashSet 时
顺序是不可预测的,而 LinkedHashSet 允许您迭代元素
按照它们插入的顺序

The TreeSet is one of two sorted collections (the other being
TreeMap). It uses a Red-Black tree structure (but you knew that), and guarantees
that the elements will be in ascending order, according to natural order. Optionally,
you can construct a TreeSet with a constructor that lets you give the collection your
own rules for what the order should be (rather than relying on the ordering defined
by the elements' class) by using a Comparable or Comparator

and A LinkedHashSet is an ordered version of HashSet that
maintains a doubly-linked List across all elements. Use this class instead of HashSet
when you care about the iteration order. When you iterate through a HashSet the
order is unpredictable, while a LinkedHashSet lets you iterate through the elements
in the order in which they were inserted

超可爱的懒熊 2024-08-12 17:07:13

既然可以吃橙子,为什么还要吃苹果呢?

说真的,伙计们,如果您的集合很大,读取和写入了无数次,并且您为 CPU 周期付费,那么只有当您需要它更好地执行时,集合的选择才有意义。然而,在大多数情况下,这并不重要——这里或那里的几毫秒对于人类来说是不会被注意到的。如果它真的那么重要,为什么不用汇编语言或 C 语言编写代码呢? [提示另一场讨论]。因此,关键是,如果您很高兴使用您选择的任何集合,并且它解决了您的问题(即使它不是该任务的最佳集合类型),那么您就会崩溃。该软件具有可塑性。必要时优化您的代码。鲍勃叔叔说过早的优化是万恶之源。 鲍勃叔叔这么说

Why have apples when you can have oranges?

Seriously guys and gals - if your collection is large, read and written to gazillions of times, and you're paying for CPU cycles, then the choice of the collection is relevant ONLY if you NEED it to perform better. However, in most cases, this doesn't really matter - a few milliseconds here and there go unnoticed in human terms. If it really mattered that much, why aren't you writing code in assembler or C? [cue another discussion]. So the point is if you're happy using whatever collection you chose, and it solves your problem [even if it's not specifically the best type of collection for the task] knock yourself out. The software is malleable. Optimise your code where necessary. Uncle Bob says Premature Optimisation is the root of all evil. Uncle Bob says so

故事↓在人 2024-08-12 17:07:13

即使 11 年后,仍然没有人想到提及一个非常重要的差异。

您是否认为如果 HashSet 等于 TreeSet 那么反之亦然?看一下这段代码:

TreeSet<String> treeSet = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
HashSet<String> hashSet = new HashSet<>();
treeSet.add("a");
hashSet.add("A");
System.out.println(hashSet.equals(treeSet));
System.out.println(treeSet.equals(hashSet));

尝试猜测输出,然后将鼠标悬停在代码片段下方以查看真正的输出是什么。准备好?干得好:


正确

是的,对于与 equals 不一致的比较器,它们不保持等价关系。其原因是 TreeSet 使用比较器来确定等效性,而 HashSet 使用 equals。它们在内部使用 HashMapTreeMap,因此您也应该预期上述 Map 也会出现这种行为。

最初回答

Even after 11 years, nobody thought of mentioning a very important difference.

Do you think that if HashSet equals TreeSet then the opposite is true as well? Take a look at this code:

TreeSet<String> treeSet = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
HashSet<String> hashSet = new HashSet<>();
treeSet.add("a");
hashSet.add("A");
System.out.println(hashSet.equals(treeSet));
System.out.println(treeSet.equals(hashSet));

Try to guess the output and then hover below snippet for seeing what the real output is. Ready? Here you go:

false
true

That's right, they don't hold equivalence relation for a comparator that is inconsistent with equals. The reason for this is that a TreeSet uses a comparator to determine the equivalence while HashSet uses equals. Internally they use HashMap and TreeMap so you should expect this behavior with the mentioned Maps as well.

Originally answered

御守 2024-08-12 17:07:13

消息编辑(完全重写)当顺序不重要时,就是这样。两者都应该给出 Log(n) - 看看其中一个是否比另一个快 5% 以上会很有用。 HashSet 可以在循环中进行 O(1) 测试,应该可以揭示是否如此。

Message Edit ( complete rewrite ) When order does not matter, that's when. Both should give Log(n) - it would be of utility to see if either is over five percent faster than the other. HashSet can give O(1) testing in a loop should reveal whether it is.

陌路终见情 2024-08-12 17:07:13

基于技术考虑,尤其是性能方面,已经给出了很多答案。
据我所知,TreeSet 和 HashSet 之间的选择很重要。

但我宁愿说选择应该首先由概念考虑因素驱动。

如果对于需要操作的对象,自然排序没有意义,则不要使用 TreeSet。
它是一个排序集,因为它实现了 SortedSet。因此,这意味着您需要重写函数 compareTo,它应该与返回函数 equals 的内容一致。例如,如果您有一组名为 Student 的类的对象,那么我认为 TreeSet 没有意义,因为学生之间没有自然顺序。你可以按照他们的平均成绩来排序,好吧,但这不是“自然排序”。不仅当两个对象代表同一个学生时,而且当两个不同的学生具有相同的成绩时,函数 compareTo 也会返回 0。对于第二种情况,equals 将返回 false(除非您决定让后者在两个不同的学生具有相同成绩时返回 true,这将使 equals 函数具有误导性意思,不是说错误的意思。)
请注意,equalscompareTo 之间的一致性是可选的,但强烈建议这样做。否则接口Set的契约被破坏,使您的代码误导其他人,从而也可能导致意外的行为。

这个链接可能是一个很好的来源有关此问题的信息。

A lot of answers have been given, based on technical considerations, especially around performance.
According to me, choice between TreeSet and HashSet matters.

But I would rather say the choice should be driven by conceptual considerations first.

If, for the objects your need to manipulate, a natural ordering does not make sense, then do not use TreeSet.
It is a sorted set, since it implements SortedSet. So it means you need to override function compareTo, which should be consistent with what returns function equals. For example if you have a set of objects of a class called Student, then I do not think a TreeSet would make sense, since there is no natural ordering between students. You can order them by their average grade, okay, but this is not a "natural ordering". Function compareTo would return 0 not only when two objects represent the same student, but also when two different students have the same grade. For the second case, equals would return false (unless you decide to make the latter return true when two different students have the same grade, which would make equals function have a misleading meaning, not to say a wrong meaning.)
Please note this consistency between equals and compareTo is optional, but strongly recommended. Otherwise the contract of interface Set is broken, making your code misleading to other people, thus also possibly leading to unexpected behavior.

This link might be a good source of information regarding this question.

殤城〤 2024-08-12 17:07:13
import java.util.HashSet;
import java.util.Set;
import java.util.TreeSet;

public class HashTreeSetCompare {

    //It is generally faster to add elements to the HashSet and then
    //convert the collection to a TreeSet for a duplicate-free sorted
    //Traversal.

    //really? 
    O(Hash + tree set) > O(tree set) ??
    Really???? Why?



    public static void main(String args[]) {

        int size = 80000;
        useHashThenTreeSet(size);
        useTreeSetOnly(size);

    }

    private static void useTreeSetOnly(int size) {

        System.out.println("useTreeSetOnly: ");
        long start = System.currentTimeMillis();
        Set<String> sortedSet = new TreeSet<String>();

        for (int i = 0; i < size; i++) {
            sortedSet.add(i + "");
        }

        //System.out.println(sortedSet);
        long end = System.currentTimeMillis();

        System.out.println("useTreeSetOnly: " + (end - start));
    }

    private static void useHashThenTreeSet(int size) {

        System.out.println("useHashThenTreeSet: ");
        long start = System.currentTimeMillis();
        Set<String> set = new HashSet<String>();

        for (int i = 0; i < size; i++) {
            set.add(i + "");
        }

        Set<String> sortedSet = new TreeSet<String>(set);
        //System.out.println(sortedSet);
        long end = System.currentTimeMillis();

        System.out.println("useHashThenTreeSet: " + (end - start));
    }
}
import java.util.HashSet;
import java.util.Set;
import java.util.TreeSet;

public class HashTreeSetCompare {

    //It is generally faster to add elements to the HashSet and then
    //convert the collection to a TreeSet for a duplicate-free sorted
    //Traversal.

    //really? 
    O(Hash + tree set) > O(tree set) ??
    Really???? Why?



    public static void main(String args[]) {

        int size = 80000;
        useHashThenTreeSet(size);
        useTreeSetOnly(size);

    }

    private static void useTreeSetOnly(int size) {

        System.out.println("useTreeSetOnly: ");
        long start = System.currentTimeMillis();
        Set<String> sortedSet = new TreeSet<String>();

        for (int i = 0; i < size; i++) {
            sortedSet.add(i + "");
        }

        //System.out.println(sortedSet);
        long end = System.currentTimeMillis();

        System.out.println("useTreeSetOnly: " + (end - start));
    }

    private static void useHashThenTreeSet(int size) {

        System.out.println("useHashThenTreeSet: ");
        long start = System.currentTimeMillis();
        Set<String> set = new HashSet<String>();

        for (int i = 0; i < size; i++) {
            set.add(i + "");
        }

        Set<String> sortedSet = new TreeSet<String>(set);
        //System.out.println(sortedSet);
        long end = System.currentTimeMillis();

        System.out.println("useHashThenTreeSet: " + (end - start));
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文