当前位置：文江博客话题详情

set duplicates Java

从大型 Set 中获取重复项的最佳性能方法是什么？

发布于 2024-11-19 22:13:57 字数 546 浏览 1 评论 0 原文

我有一个很大的Set，其中包含许多单词，例如：

“aaa，cCc，dDD，AAA，bbB，BBB，AaA，CCc，...”

我想对集合中的所有重复单词进行分组，忽略单词的大小写敏感性，然后将它们保存在 Vector> 或其他内容中，因此每个Vector 项将包含一组相似的单词，如下所示：

Vector: aaa, AAA, AaA, ...

矢量：cCc、CCc、...

矢量：bbB、BBB、 ...

我关心性能，因为这个集合包含很多单词。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一袭水袖舞倾城 2024-11-26 22:13:58

我将创建一个 HashMap>哈希映射。
接下来，对于集合中的每个“字符串”

if (!hashMap.containsKey(string.toLowerCase()){
     Vector v = new Vector();
     v.add(string);
      hashMap.put(string.toLowerCase(), v);
} else { 
     hashMap.get(string.toLowerCase()).add(string);
}

最后，如果需要，创建一个向量向量，或者使用 hashmap.valueSet()

I would create a HashMap<String, Vector<String>> hashMap.
Next, for each 'string' in your set

if (!hashMap.containsKey(string.toLowerCase()){
     Vector v = new Vector();
     v.add(string);
      hashMap.put(string.toLowerCase(), v);
} else { 
     hashMap.get(string.toLowerCase()).add(string);
}

At the end, create a Vector of vectors if needed, or work with the hashmap.valueSet()

回复收藏 0 原文

失而复得 2024-11-26 22:13:58

如果您可以选择 Set 实现，则可以将 TreeSet 与 Comparator 一起使用，比较忽略大小写的字符串。然后您将能够迭代排序列表并轻松对重复项进行分组。

回复收藏 0 原文

烟雨凡馨 2024-11-26 22:13:58

这会迭代输入集一次，我怀疑您能否获得比这更快的速度。将 ArrayList 替换为 LinkedList 可能会以局部性换取更少的复制，这可能会提高性能，但我对此表示怀疑。这是代码：

Set<String> input = new HashSet<String>(Arrays.asList(
    "aaa", "cCc", "dDD", "AAA", "bbB", "BBB", "AaA", "CCc"));

Map<String, List<String>> tmp = new HashMap<String, List<String>>();

for (String s : input) {
    String low = s.toLowerCase();
    List<String> l = tmp.get(low);

    if (l == null) {
        l = new ArrayList<String>();
        tmp.put(low, l);
    }

    l.add(s);
}

final List<List<String>> result = new ArrayList<List<String>>(tmp.values());

This iterates over the input set once and I doubt you can get much faster than that. Swapping the ArrayLists for LinkedLists may trade locality for less copying, which may be an performance gain, but I doubt it. Here's the code:

Set<String> input = new HashSet<String>(Arrays.asList(
    "aaa", "cCc", "dDD", "AAA", "bbB", "BBB", "AaA", "CCc"));

Map<String, List<String>> tmp = new HashMap<String, List<String>>();

for (String s : input) {
    String low = s.toLowerCase();
    List<String> l = tmp.get(low);

    if (l == null) {
        l = new ArrayList<String>();
        tmp.put(low, l);
    }

    l.add(s);
}

final List<List<String>> result = new ArrayList<List<String>>(tmp.values());

回复收藏 0 原文

归属感 2024-11-26 22:13:57

如果您真正关心性能，您就不会使用Vector。至于排序问题，一种解决方案是使用 TreeMap 或 TreeSet 对象并创建一个 Comparator 来执行您想要的相等（排序）。

实例化可以是：

new TreeMap<String,LinkedList<String>>(new Comparator<String>() {

   // comparator here

});

用法：

LinkedList<String> entry = map.get(nextKey);
if (entry == null) {
  entry = new LinkedList<String>()
  map.put(nextKey, entry);
}
entry.add(nextKey);

If you truly care about performance you would not use Vector. As for the sorting problem one solution would be to use the TreeMap or TreeSet object and create a Comparator that does the equality (sorting) you want.

The instantiation could be:

new TreeMap<String,LinkedList<String>>(new Comparator<String>() {

   // comparator here

});

Usage:

LinkedList<String> entry = map.get(nextKey);
if (entry == null) {
  entry = new LinkedList<String>()
  map.put(nextKey, entry);
}
entry.add(nextKey);

回复收藏 0 原文

~没有更多了~

关于作者

是你

暂无简介

文章

25 人气

关注发私信

Promise

文章 0 评论 0

关注

qq_lbRlsh

文章 0 评论 0

关注

待＂谢繁草

文章 0 评论 0

关注

yy2010hell

文章 0 评论 0

关注

漫无边际

文章 0 评论 0

关注

傲娇萝莉攻

文章 0 评论 0

友情链接

文江博客

从大型 Set 中获取重复项的最佳性能方法是什么？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

从大型 Set 中获取重复项的最佳性能方法是什么？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。