从 String 实例中删除空格字符

发布于 2024-12-01 19:50:15 字数 150 浏览 7 评论 0原文

是否有另一种方法如何从字符串中删除空白字符

1) 据我所知

myString.trim()

Pattern.compile("\\s");

2) 是否有其他原因搜索/寻找另一种/不同的方法，因为我使用

原文

is there another way how to remove WhiteSpace Char(s) from String

1) other as I know

myString.trim()

Pattern.compile("\\s");

2) is there another reason(s) search/look for an another/different method as I using

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

土豪 2024-12-08 19:50:15

Guava 有一个预配置的 CharMatcher 对于 空白( ）。它也适用于 unicode。

示例用法：

System.out.println(CharMatcher.whitespace().removeFrom("H \ne\tl\u200al \to   "));

输出：

你好

CharMatcher 还有许多其他不错的功能，我最喜欢的功能之一是 collapseFrom() 方法，替换多次出现具有单个字符：

System.out.println(
    CharMatcher.whitespace().collapseFrom("H \ne\tl\u200al \to   ", '*'));

输出：

你ello*

Guava has a preconfigured CharMatcher for whitespace(). It works with unicode as well.

Sample usage:

System.out.println(CharMatcher.whitespace().removeFrom("H \ne\tl\u200al \to   "));

Output:

Hello

The CharMatcher also has many other nice features, one of my favorites is the collapseFrom() method, which replaces multiple occurences with a single character:

System.out.println(
    CharMatcher.whitespace().collapseFrom("H \ne\tl\u200al \to   ", '*'));

Output:

Hello*

回复收藏 0 原文

毅然前行 2024-12-08 19:50:15

您可以简单地使用myString.replaceAll("\\s", "")。但是：

关于 unicode 空格的注释
请注意上面将删除换行符。如果您不想删除换行符，请将它们从正则表达式中排除。

回复收藏 0 原文

隐诗 2024-12-08 19:50:15

不断寻找不同技术的原因是找到一种能够满足您真正需要的技术。例如，trim() 仅删除字符串开头和结尾的空格。要使用正则表达式获得相同的效果，您必须执行以下操作：

s = s.replaceAll("^\\s+|\\s+$", "");

然后就是确切删除哪些字符的问题。 Java 7 之前的版本，\s 只匹配 ASCII 空白字符，即：

"[\\u0009\\u000A\\u000B\\u000C\\u000D\\u0020]"

...而（正如 Peter 观察到的）trim() 简单地删除等于或低于的所有字符代码点 32（Unicode 表示法中的 U+0020）。我怀疑这里的想法是其他字符无论如何都极不可能出现在字符串中，如果出现，您可能想删除它们。（无论如何，它对我有用。☺）但这是你应该注意的事情。下面是一些代码，演示了 trim() 和正则表达式方法之间的区别：

String s = "\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007"
         + "\u0008\u0009\n\u000B\u000C\r\u000E\u000F"
         + "\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017"
         + "\u0018\u0019\u001A\u001B\u001C\u001D\u001E\u001F"
         + "\u0020\u00A0";
System.out.println(s.length());
System.out.println(s.trim().length());
System.out.println(s.replaceAll("\\s", "").length());

输出：

34
1
28

输出的第二行中剩余的一个字符是不间断空格 (U+00A0代码>，或此后的“NBSP”）。一旦超出 ASCII 范围，就会出现更多的空白字符，但最有可能遇到的是 NBSP。 trim() 和正则表达式都没有删除它，但是观察当您将最后一行代码更改为以下内容时会发生什么：

System.out.println(s.replaceAll("(?U)\\s", "").length());

...并在 Java 7 下运行它：

34
1
27

通过添加 (? U)，我打开了 UNICODE_CHARACTER_CLASSES 模式，正如 @tchrist 在他的评论中提到的。 NBSP 是一个空白字符，无论 Character.isWhitespace() 说什么，但这并不意味着您总是希望将它包含在空白匹配中。这就是为什么 Guava （@Sean 提到的）还包含 BREAKING_WHITESPACE CharMatcher。

总之，要选择正确的工具来删除空格，您需要确切地知道要删除哪些空格字符以及要从何处删除它们。它并不那么复杂，但也不像 trim() 和 StringTokenizer 等传统工具假装的那么简单。

The reason to keep looking for different techniques is to find one that does what you really want. For example, trim() only removes the whitespace from the beginning and end of the string. To get the same effect with a regex, you have to do something like this:

s = s.replaceAll("^\\s+|\\s+$", "");

And then there's the matter of exactly which characters are removed. Pre-Java 7, \s matches only ASCII whitespace characters, i.e.:

"[\\u0009\\u000A\\u000B\\u000C\\u000D\\u0020]"

...while (as Peter observed) trim() simple-mindedly removes all characters at or below codepoint 32 (U+0020 in Unicode notation). I suspect the thinking here was that the other characters are extremely unlikely to appear in a string anyway, and if they do, you probably want to get rid of them. (It works for me, anyway. ☺) But it's something you should be aware of. Here's some code that demonstrates the difference between trim() and the regex approach:

String s = "\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007"
         + "\u0008\u0009\n\u000B\u000C\r\u000E\u000F"
         + "\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017"
         + "\u0018\u0019\u001A\u001B\u001C\u001D\u001E\u001F"
         + "\u0020\u00A0";
System.out.println(s.length());
System.out.println(s.trim().length());
System.out.println(s.replaceAll("\\s", "").length());

output:

34
1
28

The one remaining character in the second line of output is a non-breaking space (U+00A0, or "NBSP" henceforth). There are a lot more whitespace characters once you get outside the ASCII range, but the one you're most likely to encounter is the NBSP. Neither trim() nor the regex removed it, but watch what happens when you change the last line of code to this:

System.out.println(s.replaceAll("(?U)\\s", "").length());

...and run it under Java 7:

34
1
27

By adding the (?U), I turned on UNICODE_CHARACTER_CLASSES mode, as mentioned by @tchrist in his comment. NBSP is a whitespace character, no matter what Character.isWhitespace() says, but that doesn't mean you'll always want to include it in your whitespace matches. That's why Guava (mentioned by @Sean) also includes a BREAKING_WHITESPACE CharMatcher.

In sum, to choose the right tool for removing whitespace, you need to know exactly which whitespace characters you want to remove, and exactly where you want to remove them from. It's not all that complicated, but it's not as simple as legacy tools like trim() and StringTokenizer pretend it is, either.

回复收藏 0 原文

橘寄 2024-12-08 19:50:15

Trim 删除 ASCII 0 和 ASCII 32 之间的前导和尾随字符。这会删除大多数 ASCII 空格，但也会删除所有控制字符。它也不会删除字符串内的它们。

for(int i=Character.MIN_CODE_POINT;i<=Character.MAX_CODE_POINT;i++)
  if(Character.isWhitespace(i))
    System.out.println(i);

打印

9
10
11
12
13
28
29
30
31
32
5760
6158
8192
8193
8194
8195
8196
8197
8198
8200
8201
8202
8232
8233
8287
12288

Trim removes leading and trailing characters between ASCII 0 and ASCII 32. This happens to remove most ASCII whitespaces but also removes all control characters. It doesn't remove them inside the String either.

for(int i=Character.MIN_CODE_POINT;i<=Character.MAX_CODE_POINT;i++)
  if(Character.isWhitespace(i))
    System.out.println(i);

prints

9
10
11
12
13
28
29
30
31
32
5760
6158
8192
8193
8194
8195
8196
8197
8198
8200
8201
8202
8232
8233
8287
12288

回复收藏 0 原文

比忠 2024-12-08 19:50:15

我正在将一些代码从 C# 重新键入到 Java - 我需要模拟 XmlNode.OuterXml 和 XmlNode.InnerXml。为此，我使用了 Transformer，但是由于某种原因，即使您关闭缩进，它也无法正确识别某些空格。因此，我的另一个选择是使用这两个等效调用之一通过正则表达式对包含回车符、换行符和制表符的字符串进行后处理：

string.replaceAll("[\t\n\b\r\f]+ *", "");   
string.replaceAll("[\\s+ *", "");

这两个调用都会删除字符串中的任何空格和制表符空格。希望它至少有一点相关性。第二个可能是更好的选择

I was retyping some code from C# to Java - I needed to simulate XmlNode.OuterXml and XmlNode.InnerXml. For this I have used Transformer, however from some reason it does not recognize some whitespaces correctly even if you turn indentation off. So my other choice was to postprocess string containing carriage returns, linefeeds and tabs by regex using one of these two equivalent calls:

string.replaceAll("[\t\n\b\r\f]+ *", "");   
string.replaceAll("[\\s+ *", "");

both of these remove any whitespaces in a string and tab spaces as well. Hope it is at least little bit relevant. Second one is probably better choice

回复收藏 0 原文