集合removeAll忽略大小写?

发布于 2024-07-30 13:49:34 字数 865 浏览 4 评论 0原文

好的,这是我的问题。 我必须使用 HashSet,我使用 removeAll 方法从一组中删除存在于另一组中的值。

在调用该方法之前,我显然将值添加到 Set 中。 在添加之前,我对每个 String 调用 .toUpperCase() ,因为两个列表中的值大小写不同。 这个案子没有任何规律或理由。

调用 removeAll 后,我需要取回 Set 中剩余值的原始情况。 有没有一种有效的方法可以做到这一点,而无需运行原始列表并使用CompareToIgnoreCase?

示例:

List1:

"BOB"
"Joe"
"john"
"MARK"
"dave"
"Bill"

List2:

"JOE"
"MARK"
"DAVE"

此后,在 String 上使用 toUpperCase() 为每个 List 创建一个单独的 HashSet。 然后调用removeAll

Set1.removeAll(set2);

Set1:
    "BOB"
    "JOHN"
    "BILL"

我需要让列表再次看起来像这样:

"BOB"
"john"
"Bill"

任何想法将不胜感激。 我知道它很差,原始列表应该有一个标准,但这不是我可以决定的。

Ok so here is my issue. I have to HashSet's, I use the removeAll method to delete values that exist in one set from the other.

Prior to calling the method, I obviously add the values to the Sets. I call .toUpperCase() on each String before adding because the values are of different cases in both lists. There is no rhyme or reason to the case.

Once I call removeAll, I need to have the original cases back for the values that are left in the Set. Is there an efficient way of doing this without running through the original list and using CompareToIgnoreCase?

Example:

List1:

"BOB"
"Joe"
"john"
"MARK"
"dave"
"Bill"

List2:

"JOE"
"MARK"
"DAVE"

After this, create a separate HashSet for each List using toUpperCase() on Strings. Then call removeAll.

Set1.removeAll(set2);

Set1:
    "BOB"
    "JOHN"
    "BILL"

I need to get the list to look like this again:

"BOB"
"john"
"Bill"

Any ideas would be much appreciated. I know it is poor, there should be a standard for the original list but that is not for me to decide.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

一影成城 2024-08-06 13:49:34

在我原来的答案中,我不假思索地建议使用 Comparator,但这会导致 TreeSet 违反 equalscontract 并且是一个正在等待的错误发生的情况:

// Don't do this:
Set<String> setA = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
setA.add("hello");
setA.add("Hello");
System.out.println(setA);

Set<String> setB = new HashSet<String>();
setB.add("HELLO");
// Bad code; violates symmetry requirement
System.out.println(setB.equals(setA) == setA.equals(setB));

最好使用专用类型:

public final class CaselessString {
  private final String string;
  private final String normalized;

  private CaselessString(String string, Locale locale) {
    this.string = string;
    normalized = string.toUpperCase(locale);
  }

  @Override public String toString() { return string; }

  @Override public int hashCode() { return normalized.hashCode(); }

  @Override public boolean equals(Object obj) {
    if (obj instanceof CaselessString) {
      return ((CaselessString) obj).normalized.equals(normalized);
    }
    return false;
  }

  public static CaselessString as(String s, Locale locale) {
    return new CaselessString(s, locale);
  }

  public static CaselessString as(String s) {
    return as(s, Locale.ENGLISH);
  }

  // TODO: probably best to implement CharSequence for convenience
}

此代码不太可能导致错误:

Set<CaselessString> set1 = new HashSet<CaselessString>();
set1.add(CaselessString.as("Hello"));
set1.add(CaselessString.as("HELLO"));

Set<CaselessString> set2 = new HashSet<CaselessString>();
set2.add(CaselessString.as("hello"));

System.out.println("1: " + set1);
System.out.println("2: " + set2);
System.out.println("equals: " + set1.equals(set2));

不幸的是,这更冗长。

In my original answer, I unthinkingly suggested using a Comparator, but this causes the TreeSet to violate the equals contract and is a bug waiting to happen:

// Don't do this:
Set<String> setA = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
setA.add("hello");
setA.add("Hello");
System.out.println(setA);

Set<String> setB = new HashSet<String>();
setB.add("HELLO");
// Bad code; violates symmetry requirement
System.out.println(setB.equals(setA) == setA.equals(setB));

It is better to use a dedicated type:

public final class CaselessString {
  private final String string;
  private final String normalized;

  private CaselessString(String string, Locale locale) {
    this.string = string;
    normalized = string.toUpperCase(locale);
  }

  @Override public String toString() { return string; }

  @Override public int hashCode() { return normalized.hashCode(); }

  @Override public boolean equals(Object obj) {
    if (obj instanceof CaselessString) {
      return ((CaselessString) obj).normalized.equals(normalized);
    }
    return false;
  }

  public static CaselessString as(String s, Locale locale) {
    return new CaselessString(s, locale);
  }

  public static CaselessString as(String s) {
    return as(s, Locale.ENGLISH);
  }

  // TODO: probably best to implement CharSequence for convenience
}

This code is less likely to cause bugs:

Set<CaselessString> set1 = new HashSet<CaselessString>();
set1.add(CaselessString.as("Hello"));
set1.add(CaselessString.as("HELLO"));

Set<CaselessString> set2 = new HashSet<CaselessString>();
set2.add(CaselessString.as("hello"));

System.out.println("1: " + set1);
System.out.println("2: " + set2);
System.out.println("equals: " + set1.equals(set2));

This is, unfortunately, more verbose.

拥醉 2024-08-06 13:49:34

可以通过以下方式完成:

  1. 将列表的内容移动到不区分大小写的 TreeSet 中,
  2. 然后不区分大小写地删除所有常见的 String,感谢 TreeSet#removeAll( Collection c)
  3. 并最终依赖于 ArrayList#retainAll(Collection c) 将迭代列表的元素,并且对于每个元素在提供的集合上调用 contains(Object o) 来了解是否应该保留该值,这里由于集合不区分大小写,我们将只保留 String与我们提供的 TreeSet 实例中的内容不区分大小写地匹配。

相应的代码:

List<String> list1 = new ArrayList<>(
    Arrays.asList("BOB", "Joe", "john", "MARK", "dave", "Bill")
);

List<String> list2 = Arrays.asList("JOE", "MARK", "DAVE");

// Add all values of list1 in a case insensitive collection
Set<String> set1 = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
set1.addAll(list1);
// Add all values of list2 in a case insensitive collection
Set<String> set2 = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
set2.addAll(list2);
// Remove all common Strings ignoring case
set1.removeAll(set2);
// Keep in list1 only the remaining Strings ignoring case
list1.retainAll(set1);

for (String s : list1) {
    System.out.println(s);
}

输出:

BOB
john
Bill

NB 1: 将第二个列表的内容放入 TreeSet 中非常重要,特别是如果我们不这样做的话知道它的大小,因为如果当前集合的大小严格大于当前集合的大小,则 TreeSet#removeAll(Collection c) 的行为取决于两个集合的大小提供的集合,那么它将直接调用当前集合上的 remove(Object o) 来删除每个元素,在这种情况下提供的集合可以是一个列表。 但如果相反,它将在提供的集合上调用 contains(Object o) 来知道是否应该删除给定的元素,因此如果它不是不区分大小写的集合,我们就赢了得不到预期的结果。

注意事项 2:上述方法 ArrayList#retainAll(Collection c) 的行为与方法 默认实现的行为相同我们可以在 AbstractCollection 中找到 >retainAll(Collection c),这样该方法实际上适用于任何实现了 retainAll(Collection> 的集合。 c) 具有相同的行为。

It could be done by:

  1. Moving the content of your lists into case-insensitive TreeSets,
  2. then removing all common Strings case-insensitively thanks TreeSet#removeAll(Collection<?> c)
  3. and finally relying on the fact that ArrayList#retainAll(Collection<?> c) will iterate over the elements of the list and for each element it will call contains(Object o) on the provided collection to know whether the value should be kept or not and here as the collection is case-insensitive, we will keep only the Strings that match case-insensitively with what we have in the provided TreeSet instance.

The corresponding code:

List<String> list1 = new ArrayList<>(
    Arrays.asList("BOB", "Joe", "john", "MARK", "dave", "Bill")
);

List<String> list2 = Arrays.asList("JOE", "MARK", "DAVE");

// Add all values of list1 in a case insensitive collection
Set<String> set1 = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
set1.addAll(list1);
// Add all values of list2 in a case insensitive collection
Set<String> set2 = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
set2.addAll(list2);
// Remove all common Strings ignoring case
set1.removeAll(set2);
// Keep in list1 only the remaining Strings ignoring case
list1.retainAll(set1);

for (String s : list1) {
    System.out.println(s);
}

Output:

BOB
john
Bill

NB 1: It is important to have the content of the second list into a TreeSet especially if we don't know the size of it because the behavior of TreeSet#removeAll(Collection<?> c) depends on the size of both collections, if the size of the current collection is strictly bigger than the size of the provided collection, then it will call directly remove(Object o) on the current collection to remove each element, in this case the provided collection could be a list. But if it is the opposite, it will call contains(Object o) on the provided collection to know whether a given element should be removed or not so if it is not an case-insensitive collection, we won't get the expected result.

NB 2: The behavior of the method ArrayList#retainAll(Collection<?> c) described above is the same as the behavior of the default implementation of the method retainAll(Collection<?> c) that we can find in AbstractCollection such that this approach will actually work with any collections whose implementation of retainAll(Collection<?> c) has the same behavior.

悟红尘 2024-08-06 13:49:34

您可以使用 hashmap 和使用大写集作为映射到混合大小写集的键。

hashmap 的键是唯一的,您可以使用 HashMap.keyset() 获取一组键;

要检索原始大小写,就像 HashMap.get("UPPERCASENAME") 一样简单。

并根据 文档< /a>:

返回按键的集合视图
包含在这张地图中。 该套装是
由地图支持,因此更改为
地图反映在集合中,并且
反之亦然。
集合支持元素
删除,这会删除
这张地图上对应的映射,
通过 Iterator.remove、Set.remove,
删除全部、保留全部和清除
运营。 它不支持
添加或添加所有操作。

所以 HashMap.keyset().removeAll 将影响 hashmap :)

编辑:使用 McDowell 的解决方案。 我忽略了一个事实,即您实际上并不需要字母大写:P

You can use a hashmap and use the capital set as keys that map to the mixed case set.

Keys of hashmaps are unique and you can get a set of them using HashMap.keyset();

to retrieve the original case, it's as simple as HashMap.get("UPPERCASENAME").

And according to the documentation:

Returns a set view of the keys
contained in this map. The set is
backed by the map, so changes to the
map are reflected in the set, and
vice-versa.
The set supports element
removal, which removes the
corresponding mapping from this map,
via the Iterator.remove, Set.remove,
removeAll, retainAll, and clear
operations. It does not support the
add or addAll operations.

So HashMap.keyset().removeAll will effect the hashmap :)

EDIT: use McDowell's solution. I overlooked the fact that you didn't actually need the letters to be upper case :P

星星的軌跡 2024-08-06 13:49:34

使用 google-collections 来解决这个问题将是一个有趣的问题。 您可以有一个像这样的常量谓词:

private static final Function<String, String> TO_UPPER = new Function<String, String>() {
    public String apply(String input) {
       return input.toUpperCase();
}

然后您所追求的可以像这样完成:

Collection<String> toRemove = Collections2.transform(list2, TO_UPPER);

Set<String> kept = Sets.filter(list1, new Predicate<String>() {
    public boolean apply(String input) {
        return !toRemove.contains(input.toUpperCase());
    }
}

即:

  • 构建“要丢弃”列表的仅大写版本
  • 对原始列表应用过滤器,保留那些大写值不在仅大写列表中的项目。

请注意,Collections2.transform 的输出不是一个高效的 Set 实现,因此,如果您正在处理大量数据并且探测该列表的成本将会受到影响您可以改为使用

Set<String> toRemove = Sets.newHashSet(Collections2.transform(list2, TO_UPPER));

它将恢复高效查找,将过滤返回到 O(n) 而不是 O(n^2)。

This would be an interesting one to solve using google-collections. You could have a constant Predicate like so:

private static final Function<String, String> TO_UPPER = new Function<String, String>() {
    public String apply(String input) {
       return input.toUpperCase();
}

and then what you're after could be done someting like this:

Collection<String> toRemove = Collections2.transform(list2, TO_UPPER);

Set<String> kept = Sets.filter(list1, new Predicate<String>() {
    public boolean apply(String input) {
        return !toRemove.contains(input.toUpperCase());
    }
}

That is:

  • Build an upper-case-only version of the 'to discard' list
  • Apply a filter to the original list, retaining only those items whose uppercased value is not in the upper-case-only list.

Note that the output of Collections2.transform isn't an efficient Set implementation, so if you're dealing with a lot of data and the cost of probing that list will hurt you, you can instead use

Set<String> toRemove = Sets.newHashSet(Collections2.transform(list2, TO_UPPER));

which will restore an efficient lookup, returning the filtering to O(n) instead of O(n^2).

时光瘦了 2024-08-06 13:49:34

据我所知,哈希集使用对象的 hashCode 方法来区分它们。
因此,您应该在对象中重写此方法,以便区分不同的情况。

如果您确实使用字符串,则无法重写此方法,因为您无法扩展 String 类。

因此,您需要创建自己的类,其中包含一个字符串作为属性,并用您的内容填充该属性。 您可能需要 getValue() 和 setValue(String) 方法来修改字符串。

然后你可以将你自己的类添加到哈希图中。

这应该可以解决你的问题。

问候

as far as i know, hashset's use the object's hashCode-method to distinct them from each other.
you should therefore override this method in your object in order to distinct cases.

if you're really using string, you cannot override this method as you cannot extend the String-class.

therefore you need to create your own class containing a string as attribute which you fill with your content. you might want to have a getValue() and setValue(String) method in order to modify the string.

then you can add your own class to the hashmap.

this should solve your problem.

regards

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文