您会使用哪种数据结构：TreeMap 还是 HashMap？（爪哇）

发布于 2024-07-09 07:11:53 字数 691 浏览 8 评论 0原文

描述 | 一个 Java 程序，用于读取文本文件并按字母顺序打印每个唯一单词以及该单词在文本中出现的次数。

程序应该声明一个Map类型的变量来存储单词和相应的出现频率。哪种具体类型呢？ TreeMap 或 HashMap ？

输入应转换为小写。

单词不包含以下任何字符： \t\t\n]f.,!?:;\"()'

示例输出 |

 Word            Frequency
  a                 1
  and               5
  appearances       1
  as                1
         .
         .
         .

备注| 我知道，我已经在 Perl 中看到了大约两行代码的优雅解决方案，但是，我想在 Java 中看到它

编辑：哦，是的，使用其中之一来展示一个实现会很有帮助。这些结构（在 Java 中）。

原文

Description | A Java program to read a text file and print each of the unique words in alphabetical order together with the number of times the word occurs in the text.

The program should declare a variable of type Map<String, Integer> to store the words and corresponding frequency of occurrence. Which concrete type, though? TreeMap<String, Number> or HashMap<String, Number> ?

The input should be converted to lower case.

A word does not contain any of these characters: \t\t\n]f.,!?:;\"()'

Example output |

 Word            Frequency
  a                 1
  and               5
  appearances       1
  as                1
         .
         .
         .

Remark | I know, I've seen elegant solutions to this in Perl with roughly two lines of code. However, I want to see it in Java.

Edit: Oh yeah, it be helpful to show an implementation using one of these structures (in Java).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

倾城花音 2024-07-16 07:11:54

根据速度要求，您还可以使用 Trie。但如果 TreeMap 足够快，那么实现其中之一就没有意义。

回复收藏 0 原文

还在原地等你 2024-07-16 07:11:54

考虑添加或删除数据结构的频率。如果 TreeMap 很高的话，那就不太理想了。除了搜索现有条目 nLn 之外，它还会进行频繁的重新平衡。

另一方面，哈希结构在内存上有点浮夸（过度分配）。如果你能咬紧牙关，那么就选择哈希结构并在需要时进行排序。

回复收藏 0 原文

深海少女心 2024-07-16 07:11:54

这是读取文本文件的 java 示例，根据键排序，然后根据值排序；取决于文件中单词出现的次数。

public class SortFileWords {

    public static void main(String[] args) {
        HashMap<String, Integer> map = new HashMap<String, Integer>();
        ValueCompare vc = new ValueCompare(map);
        TreeMap<String, Integer> sorted_map = new TreeMap<String, Integer>(map);
        List<String> list = new ArrayList<>();
        Scanner sc;
        try {
            sc = new Scanner(new File("c:\\ReadMe1.txt"));
            while (sc.hasNext()) {
                list.add(sc.next());
            }
            sc.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }

        for (String s : list) {
            if (map.containsKey(s)) {
                map.put(s, map.get(s) + 1);
            } else
                map.put(s, 1);
        }

        System.out.println("Unsorted map: " + map);
        sorted_map.putAll(map);
        System.out.println("Sorted map on keys: " + sorted_map);

        TreeMap<String, Integer> sorted_value_map = new TreeMap<>(vc);
        sorted_value_map.putAll(map);
        System.out.println("Sorted map on values: " + sorted_value_map);
    }
}

class ValueCompare implements Comparator<String> {

    Map<String, Integer> map;

    public ValueCompare(Map<String, Integer> map) {
        this.map = map;
    }

    @Override
    public int compare(String s1, String s2) {
        if (map.get(s1) >= map.get(s2))
            return -1;
        else
            return 1;
    }
}

Here is the java example for reading a text file, sorting based on key, then upon values; depending on the number of occurrence of a words in the file.

public class SortFileWords {

    public static void main(String[] args) {
        HashMap<String, Integer> map = new HashMap<String, Integer>();
        ValueCompare vc = new ValueCompare(map);
        TreeMap<String, Integer> sorted_map = new TreeMap<String, Integer>(map);
        List<String> list = new ArrayList<>();
        Scanner sc;
        try {
            sc = new Scanner(new File("c:\\ReadMe1.txt"));
            while (sc.hasNext()) {
                list.add(sc.next());
            }
            sc.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }

        for (String s : list) {
            if (map.containsKey(s)) {
                map.put(s, map.get(s) + 1);
            } else
                map.put(s, 1);
        }

        System.out.println("Unsorted map: " + map);
        sorted_map.putAll(map);
        System.out.println("Sorted map on keys: " + sorted_map);

        TreeMap<String, Integer> sorted_value_map = new TreeMap<>(vc);
        sorted_value_map.putAll(map);
        System.out.println("Sorted map on values: " + sorted_value_map);
    }
}

class ValueCompare implements Comparator<String> {

    Map<String, Integer> map;

    public ValueCompare(Map<String, Integer> map) {
        this.map = map;
    }

    @Override
    public int compare(String s1, String s2) {
        if (map.get(s1) >= map.get(s2))
            return -1;
        else
            return 1;
    }
}

回复收藏 0 原文

人间不值得 2024-07-16 07:11:54

为什么不使用 TreeSet ？

与 TreeMap 相同的排序概念，只不过它是一个 Set - 根据定义，它是“不包含重复元素的集合”。

从您的问题描述来看，听起来好像您需要一个集合，我看不到您将哪些键和值映射在一起。

此类实现 Set 接口，由 TreeMap 实例支持。此类保证排序后的集合将按元素升序排列，根据元素的自然顺序（请参阅 Comparable）排序，或根据集合创建时提供的比较器排序，具体取决于使用的构造函数。

回复收藏 0 原文

泛滥成性 2024-07-16 07:11:54

基本上这取决于要求。有时哈希图很好，有时树图也很好。但哈希映射最好仅使用它们的一些约束来对其进行排序。

回复收藏 0 原文

朕就是辣么酷 2024-07-16 07:11:53

TreeMap对我来说似乎是理所当然的 - 仅仅是因为“按字母顺序”的要求。 HashMap 在迭代时没有顺序； TreeMap 按自然键顺序迭代。

编辑：我认为康拉德的评论可能是建议“使用 HashMap，然后排序”。这很好，因为虽然我们最初会有 N 次迭代，但由于重复，最终我们将拥有 K <= N 个键。我们不妨把昂贵的部分（排序）保存到最后，当我们拥有的键数少于我们在进行过程中保持排序的小但非恒定的打击时。

话虽如此，我暂时坚持我的答案：因为这是实现目标的最简单方式。我们真的不知道OP是否特别担心性能，但这个问题暗示他关心优雅和简洁。使用 TreeMap 使这变得非常简短，这对我很有吸引力。我怀疑如果性能确实是一个问题，那么可能有比 TreeMap 或 HashMap 更好的方法来攻击它:)

回复收藏 0 原文

抱着落日 2024-07-16 07:11:53

TreeMap 胜过 HashMap，因为 TreeMap 已经为您排序了。

但是，您可能需要考虑使用更合适的数据结构，即包。看
Commons Collections - 和 TreeBag 类：

这有一个很好的优化的内部结构和 API：

bag.add("big")
bag.add("small")
bag.add("big")
int count = bag.getCount("big")

编辑：HashMap 与 TreeMap 性能的问题是Jon 回答 - HashMap 和排序可能更快（尝试一下！），但 TreeBag 更容易。对于包包来说也是如此。有一个 HashBag 和一个 TreeBag。根据实现（使用可变整数），包的性能应该优于等效的整数普通映射。与任何性能问题一样，唯一确定的方法就是进行测试。

TreeMap beats HashMap because TreeMap is already sorted for you.

However, you might want to consider using a more appropriate data structure, a bag. See
Commons Collections - and the TreeBag class:

This has a nice optimised internal structure and API:

bag.add("big")
bag.add("small")
bag.add("big")
int count = bag.getCount("big")

EDIT: The question of HashMap vs TreeMap performance was answered by Jon - HashMap and sort may be quicker (try it!), but TreeBag is easier. The same is true for bags. There is a HashBag as well as a TreeBag. Based on the implementation (uses a mutable integer) a bag should outperform the equivalent plain map of Integer. The only way to know for sure is to test, as with any performance question.

回复收藏 0 原文

最偏执的依靠 2024-07-16 07:11:53

我看到很多人说“TreeMap 查找需要 O(n log n)”！怎么会？

我不知道它是如何实现的，但在我看来，它需要O(log n)。

这是因为树中的查找可以在 O(log n) 内完成。每次向树中插入项目时，不必对整个树进行排序。这就是使用树的全部想法！

因此，回到最初的问题，比较的数字是：

HashMap 方法： O(n + k log k) 平均情况，最坏情况可能是更大的

TreeMap 方法： O(k + n log k) 最坏的情况

，其中 n = 文本中的单词数，k = 文本中不同单词的数量。

回复收藏 0 原文

音盲 2024-07-16 07:11:53

哈希映射应该快得多。您不应该根据您希望物品最终如何排列来选择容器；只需对最后的（单词，频率）对列表进行排序即可。通常，要排序的此类对比文件中的单词少，因此使用哈希映射的渐近（和真实）性能会更好。

回复收藏 0 原文

居里长安 2024-07-16 07:11:53

import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.ObjectInputStream.GetField;
import java.util.Iterator;
import java.util.Map;
import java.util.StringTokenizer;
import java.util.TreeMap;

public class TreeMapExample {

    public static void main (String args[]){
        Map<String,Integer> tm = new TreeMap<String,Integer>();
        try {

            FileInputStream fis = new FileInputStream("Test.txt");
            DataInputStream in = new DataInputStream(fis);
            BufferedReader br = new BufferedReader(new InputStreamReader(in));
            String line;
            int countValue = 1;
            while((line = br.readLine())!= null ){
                line = line.replaceAll("[-+.^:;,()\"\\[\\]]","");
                StringTokenizer st = new StringTokenizer(line, " ");    
                while(st.hasMoreTokens()){
                    String nextElement = (String) st.nextElement();

                    if(tm.size()>0 && tm.containsKey(nextElement)){
                        int val = 0;
                        if(tm.get(nextElement)!= null){
                        val = (Integer) tm.get(nextElement);
                        val = val+1;
                        }
                        tm.put(nextElement, val);
                    }else{
                    tm.put(nextElement, 1);
                    }

                }
            }
            for(Map.Entry<String,Integer> entry : tm.entrySet()) {
            System.out.println(entry.getKey() + " : " + entry.getValue());
            }

        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

}

import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.ObjectInputStream.GetField;
import java.util.Iterator;
import java.util.Map;
import java.util.StringTokenizer;
import java.util.TreeMap;

public class TreeMapExample {

    public static void main (String args[]){
        Map<String,Integer> tm = new TreeMap<String,Integer>();
        try {

            FileInputStream fis = new FileInputStream("Test.txt");
            DataInputStream in = new DataInputStream(fis);
            BufferedReader br = new BufferedReader(new InputStreamReader(in));
            String line;
            int countValue = 1;
            while((line = br.readLine())!= null ){
                line = line.replaceAll("[-+.^:;,()\"\\[\\]]","");
                StringTokenizer st = new StringTokenizer(line, " ");    
                while(st.hasMoreTokens()){
                    String nextElement = (String) st.nextElement();

                    if(tm.size()>0 && tm.containsKey(nextElement)){
                        int val = 0;
                        if(tm.get(nextElement)!= null){
                        val = (Integer) tm.get(nextElement);
                        val = val+1;
                        }
                        tm.put(nextElement, val);
                    }else{
                    tm.put(nextElement, 1);
                    }

                }
            }
            for(Map.Entry<String,Integer> entry : tm.entrySet()) {
            System.out.println(entry.getKey() + " : " + entry.getValue());
            }

        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

}

回复收藏 0 原文

苍暮颜 2024-07-16 07:11:53

您无法将 TreeMap 分配给类型为 Map 的变量。 Double、Long 等可以“放入”到 TreeMap 中。当我从 Map 中“获取”一个值时，它必须是一个 Integer。

完全忽略任何 i18n 问题、内存限制和错误处理，如下所示：

class Counter {

  public static void main(String... argv)
    throws Exception
  {
    FileChannel fc = new FileInputStream(argv[0]).getChannel();
    ByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
    CharBuffer cb = Charset.defaultCharset().decode(bb);
    Pattern p = Pattern.compile("[^ \t\r\n\f.,!?:;\"()']+");
    Map<String, Integer> counts = new TreeMap<String, Integer>();
    Matcher m = p.matcher(cb);
    while (m.find()) {
      String word = m.group();
      Integer count = counts.get(word);
      count = (count == null) ? 1 : count + 1;
      counts.put(word, count);
    }
    fc.close();
    for (Map.Entry<String, Integer> e : counts.entrySet()) {
      System.out.printf("%s: %d%n", e.getKey(), e.getValue());
    }
  }

}

You can't assign a TreeMap<String,Number> to a variable with the type Map<String,Integer>. Double, Long, etc. can be "put" into a TreeMap<String,Number>. When I "get" a value from a Map<String,Integer>, it must be an Integer.

Completely ignoring any i18n issues, memory constraints, and error handling, here goes:

class Counter {

  public static void main(String... argv)
    throws Exception
  {
    FileChannel fc = new FileInputStream(argv[0]).getChannel();
    ByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
    CharBuffer cb = Charset.defaultCharset().decode(bb);
    Pattern p = Pattern.compile("[^ \t\r\n\f.,!?:;\"()']+");
    Map<String, Integer> counts = new TreeMap<String, Integer>();
    Matcher m = p.matcher(cb);
    while (m.find()) {
      String word = m.group();
      Integer count = counts.get(word);
      count = (count == null) ? 1 : count + 1;
      counts.put(word, count);
    }
    fc.close();
    for (Map.Entry<String, Integer> e : counts.entrySet()) {
      System.out.printf("%s: %d%n", e.getKey(), e.getValue());
    }
  }

}

回复收藏 0 原文

讽刺将军 2024-07-16 07:11:53

“当一个键已经存在时，它具有与 HashMap 相同的性能。” - 这完全是错误的。 HashMap 的插入操作为 O(1)，TreeMap 的插入操作为 O(n log n)。至少需要 n log n 次检查才能确定它是否在表中！

回复收藏 0 原文

变身佩奇 2024-07-16 07:11:53

为此，我认为最好使用 HashBag 来自 Apache Commons Collections或 HashMultiset 来自番石榴或 HashBag 来自 Eclipse Collections（正式 GS Collections）或任何以下类：

    Order    |  Guava           |   Apache  | Eclipse(GS) | JDK analog
─────────────┼──────────────────┼───────────┼─────────────┼─────────────
Not define   | HashMultiset     |   HashBag | HashBag     | HashMap<String, Integer>
─────────────┼──────────────────┼───────────┼─────────────┼─────────────
Sorted       | TreeMultiset     |   TreeBag | TreeBag     | TreeMap<String, Integer>
─────────────┼──────────────────┼───────────┼─────────────┼─────────────
Linked       |LinkedHashMultiset|     -     |     -       | LinkedHashMap<String, Integere>
─────────────┼──────────────────┼───────────┼─────────────┼─────────────
Concurrent & | ConcurrentHash-  |Synchroniz-|Synchroniz-  | Collections.synchronizedMap(
not define   | Multiset         |   edBag   | edBag       |       HashMap<String, Integer>)
─────────────┼──────────────────┼───────────┼─────────────┼─────────────
Concurrent   |         -        |Synchroniz-|Synchroniz-  | Collections.synchronizedSorted-
and sorted   |                  |edSortedBag| edSortedBag |       Map(TreeMap<>))
─────────────┼──────────────────┼───────────┼─────────────┼─────────────
Immutable and| ImmutableMultiset|Unmodifiab-|Unmodifiab-  | Collections.unmodifiableMap(
not define   |                  |   leBag   | leBag       | HashMap<String, Integer>)
─────────────┼──────────────────┼───────────┼─────────────┼─────────────
Immutable and| ImmutableSorted- |Unmodifiab-|Unmodifiab-  | Collections.unmodifiableSorted-
sorted       | Multiset         |leSortedBag| leSortedBag | Map(TreeMap<String, Integer>))
────────────────────────────────────────────────────────────────────────

示例：

1. 使用 Apache 的 SynchronizedSortedBag：

    // Parse text to separate words
    String INPUT_TEXT = "Hello World! Hello All! Hi World!";
    // Create Multiset
    Bag bag = SynchronizedSortedBag.synchronizedBag(new TreeBag(Arrays.asList(INPUT_TEXT.split(" "))));

    // Print count words
    System.out.println(bag); // print [1:All!,2:Hello,1:Hi,2:World!]- in natural (alphabet) order
    // Print all unique words
    System.out.println(bag.uniqueSet());    // print [All!, Hello, Hi, World!]- in natural (alphabet) order


    // Print count occurrences of words
    System.out.println("Hello = " + bag.getCount("Hello"));    // print 2
    System.out.println("World = " + bag.getCount("World!"));    // print 2
    System.out.println("All = " + bag.getCount("All!"));    // print 1
    System.out.println("Hi = " + bag.getCount("Hi"));    // print 1
    System.out.println("Empty = " + bag.getCount("Empty"));    // print 0

    // Print count all words
    System.out.println(bag.size());    //print 6

    // Print count unique words
    System.out.println(bag.uniqueSet().size());    //print 4

2. 使用 Eclipse 中的 TreeBag(GC)：

    // Parse text to separate words
    String INPUT_TEXT = "Hello World! Hello All! Hi World!";
    // Create Multiset
    MutableSortedBag<String> bag =  TreeBag.newBag(Arrays.asList(INPUT_TEXT.split(" ")));

    // Print count words
    System.out.println(bag); // print [All!, Hello, Hello, Hi, World!, World!]- in natural order
    // Print all unique words
    System.out.println(bag.toSortedSet());    // print [All!, Hello, Hi, World!]- in natural order

    // Print count occurrences of words
    System.out.println("Hello = " + bag.occurrencesOf("Hello"));    // print 2
    System.out.println("World = " + bag.occurrencesOf("World!"));    // print 2
    System.out.println("All = " + bag.occurrencesOf("All!"));    // print 1
    System.out.println("Hi = " + bag.occurrencesOf("Hi"));    // print 1
    System.out.println("Empty = " + bag.occurrencesOf("Empty"));    // print 0

    // Print count all words
    System.out.println(bag.size());    //print 6

    // Print count unique words
    System.out.println(bag.toSet().size());    //print 4

3. 使用 Guava 中的 LinkedHashMultiset：

    // Parse text to separate words
    String INPUT_TEXT = "Hello World! Hello All! Hi World!";
    // Create Multiset
    Multiset<String> multiset = LinkedHashMultiset.create(Arrays.asList(INPUT_TEXT.split(" ")));

    // Print count words
    System.out.println(multiset); // print [Hello x 2, World! x 2, All!, Hi]- in predictable iteration order
    // Print all unique words
    System.out.println(multiset.elementSet());    // print [Hello, World!, All!, Hi] - in predictable iteration order

    // Print count occurrences of words
    System.out.println("Hello = " + multiset.count("Hello"));    // print 2
    System.out.println("World = " + multiset.count("World!"));    // print 2
    System.out.println("All = " + multiset.count("All!"));    // print 1
    System.out.println("Hi = " + multiset.count("Hi"));    // print 1
    System.out.println("Empty = " + multiset.count("Empty"));    // print 0

    // Print count all words
    System.out.println(multiset.size());    //print 6

    // Print count unique words
    System.out.println(multiset.elementSet().size());    //print 4

您可以在我的 github 项目中找到更多示例

For this way, in my opinion, better use HashBag from Apache Commons Collections or HashMultiset from Guava or HashBag from Eclipse Collections (formaly GS Collections) or any following classes:

    Order    |  Guava           |   Apache  | Eclipse(GS) | JDK analog
─────────────┼──────────────────┼───────────┼─────────────┼─────────────
Not define   | HashMultiset     |   HashBag | HashBag     | HashMap<String, Integer>
─────────────┼──────────────────┼───────────┼─────────────┼─────────────
Sorted       | TreeMultiset     |   TreeBag | TreeBag     | TreeMap<String, Integer>
─────────────┼──────────────────┼───────────┼─────────────┼─────────────
Linked       |LinkedHashMultiset|     -     |     -       | LinkedHashMap<String, Integere>
─────────────┼──────────────────┼───────────┼─────────────┼─────────────
Concurrent & | ConcurrentHash-  |Synchroniz-|Synchroniz-  | Collections.synchronizedMap(
not define   | Multiset         |   edBag   | edBag       |       HashMap<String, Integer>)
─────────────┼──────────────────┼───────────┼─────────────┼─────────────
Concurrent   |         -        |Synchroniz-|Synchroniz-  | Collections.synchronizedSorted-
and sorted   |                  |edSortedBag| edSortedBag |       Map(TreeMap<>))
─────────────┼──────────────────┼───────────┼─────────────┼─────────────
Immutable and| ImmutableMultiset|Unmodifiab-|Unmodifiab-  | Collections.unmodifiableMap(
not define   |                  |   leBag   | leBag       | HashMap<String, Integer>)
─────────────┼──────────────────┼───────────┼─────────────┼─────────────
Immutable and| ImmutableSorted- |Unmodifiab-|Unmodifiab-  | Collections.unmodifiableSorted-
sorted       | Multiset         |leSortedBag| leSortedBag | Map(TreeMap<String, Integer>))
────────────────────────────────────────────────────────────────────────

Examples:

1. Using SynchronizedSortedBag from Apache:

    // Parse text to separate words
    String INPUT_TEXT = "Hello World! Hello All! Hi World!";
    // Create Multiset
    Bag bag = SynchronizedSortedBag.synchronizedBag(new TreeBag(Arrays.asList(INPUT_TEXT.split(" "))));

    // Print count words
    System.out.println(bag); // print [1:All!,2:Hello,1:Hi,2:World!]- in natural (alphabet) order
    // Print all unique words
    System.out.println(bag.uniqueSet());    // print [All!, Hello, Hi, World!]- in natural (alphabet) order


    // Print count occurrences of words
    System.out.println("Hello = " + bag.getCount("Hello"));    // print 2
    System.out.println("World = " + bag.getCount("World!"));    // print 2
    System.out.println("All = " + bag.getCount("All!"));    // print 1
    System.out.println("Hi = " + bag.getCount("Hi"));    // print 1
    System.out.println("Empty = " + bag.getCount("Empty"));    // print 0

    // Print count all words
    System.out.println(bag.size());    //print 6

    // Print count unique words
    System.out.println(bag.uniqueSet().size());    //print 4

2. Using TreeBag from Eclipse(GC):

    // Parse text to separate words
    String INPUT_TEXT = "Hello World! Hello All! Hi World!";
    // Create Multiset
    MutableSortedBag<String> bag =  TreeBag.newBag(Arrays.asList(INPUT_TEXT.split(" ")));

    // Print count words
    System.out.println(bag); // print [All!, Hello, Hello, Hi, World!, World!]- in natural order
    // Print all unique words
    System.out.println(bag.toSortedSet());    // print [All!, Hello, Hi, World!]- in natural order

    // Print count occurrences of words
    System.out.println("Hello = " + bag.occurrencesOf("Hello"));    // print 2
    System.out.println("World = " + bag.occurrencesOf("World!"));    // print 2
    System.out.println("All = " + bag.occurrencesOf("All!"));    // print 1
    System.out.println("Hi = " + bag.occurrencesOf("Hi"));    // print 1
    System.out.println("Empty = " + bag.occurrencesOf("Empty"));    // print 0

    // Print count all words
    System.out.println(bag.size());    //print 6

    // Print count unique words
    System.out.println(bag.toSet().size());    //print 4

3. Using LinkedHashMultiset from Guava:

    // Parse text to separate words
    String INPUT_TEXT = "Hello World! Hello All! Hi World!";
    // Create Multiset
    Multiset<String> multiset = LinkedHashMultiset.create(Arrays.asList(INPUT_TEXT.split(" ")));

    // Print count words
    System.out.println(multiset); // print [Hello x 2, World! x 2, All!, Hi]- in predictable iteration order
    // Print all unique words
    System.out.println(multiset.elementSet());    // print [Hello, World!, All!, Hi] - in predictable iteration order

    // Print count occurrences of words
    System.out.println("Hello = " + multiset.count("Hello"));    // print 2
    System.out.println("World = " + multiset.count("World!"));    // print 2
    System.out.println("All = " + multiset.count("All!"));    // print 1
    System.out.println("Hi = " + multiset.count("Hi"));    // print 1
    System.out.println("Empty = " + multiset.count("Empty"));    // print 0

    // Print count all words
    System.out.println(multiset.size());    //print 6

    // Print count unique words
    System.out.println(multiset.elementSet().size());    //print 4