Java解析带有大量空格的字符串

发布于 2025-01-05 19:37:13 字数 96 浏览 0 评论 0原文

我有一个包含多个空格的字符串,但是当我使用分词器时,它会在所有这些空格处将其分开。我需要令牌来包含这些空格。如何利用 StringTokenizer 返回带有我要分割的标记的值?

I have a string with multiple spaces, but when I use the tokenizer it breaks it apart at all of those spaces. I need the tokens to contain those spaces. How can I utilize the StringTokenizer to return the values with the tokens I am splitting on?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

幽梦紫曦~ 2025-01-12 19:37:13

您会在 StringTokenizer 的文档中注意到,建议不要将其用于任何新代码,并且 String.split(regex) 就是您想要的想要

String foo = "this is      some  data      in   a string";
String[] bar = foo.split("\\s+");

编辑添加:或者,如果您比简单的拆分有更大的需求,则可以使用 PatternMatcher 类来实现更复杂的正则表达式匹配和提取。

再次编辑:如果您想保留空格,实际上了解一些正则表达式确实有帮助:

String[] bar = foo.split("\\b+");

这将在单词边界上进行分割,将每个单词之间的空格保留为字符串;

public static void main( String[] args )
{
    String foo = "this is      some  data      in   a string";
    String[] bar = foo.split("\\b");
    for (String s : bar)
    {
        System.out.print(s);
        if (s.matches("^\\s+$"))
        {
            System.out.println("\t<< " + s.length() + " spaces");
        }
        else
        {
            System.out.println();
        }
    }
}

输出:

this
        << 1 spaces
is
        << 6 spaces
some
        << 2 spaces
data
        << 6 spaces
in
        << 3 spaces
a
        << 1 spaces
string

You'll note in the docs for the StringTokenizer that it is recommended it shouldn't be used for any new code, and that String.split(regex) is what you want

String foo = "this is      some  data      in   a string";
String[] bar = foo.split("\\s+");

Edit to add: Or, if you have greater needs than a simple split, then use the Pattern and Matcher classes for more complex regular expression matching and extracting.

Edit again: If you want to preserve your space, actually knowing a bit about regular expressions really helps:

String[] bar = foo.split("\\b+");

This will split on word boundaries, preserving the space between each word as a String;

public static void main( String[] args )
{
    String foo = "this is      some  data      in   a string";
    String[] bar = foo.split("\\b");
    for (String s : bar)
    {
        System.out.print(s);
        if (s.matches("^\\s+$"))
        {
            System.out.println("\t<< " + s.length() + " spaces");
        }
        else
        {
            System.out.println();
        }
    }
}

Output:

this
        << 1 spaces
is
        << 6 spaces
some
        << 2 spaces
data
        << 6 spaces
in
        << 3 spaces
a
        << 1 spaces
string
触ぅ动初心 2025-01-12 19:37:13

听起来您可能需要使用正则表达式(http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/package-summary.html)而不是 StringTokenizer

Sounds like you may need to use regular expressions (http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/package-summary.html) instead of StringTokenizer.

妄断弥空 2025-01-12 19:37:13

使用 String.split("\\s+") 而不是 StringTokenizer

请注意,这只会提取由至少一个空格字符分隔的非空格字符,如果您希望前导/尾随空格字符包含在非空格字符中,那将是一种完全不同的解决方案!

从您最初的问题来看,这一要求并不清楚,并且有一个待处理的编辑试图澄清它。

在几乎所有非人为的情况下,StringTokenizer 都是错误的工具。

Use String.split("\\s+") instead of StringTokenizer.

Note that this will only extract the non-whitespace characters separated by at least one whitespace character, if you want leading/trailing whitespace characters included with the non-whitespace characters that will be a completely different solution!

This requirement isn't clear from your original question, and there is an edit pending that tries to clarify it.

StringTokenizer in almost every non-contrived case is the wrong tool for the job.

你げ笑在眉眼 2025-01-12 19:37:13

我认为如果您首先使用 replaceAll 函数将所有多个空格替换为单个空格,然后使用 split 函数进行标记化,那就太好了。

I think It will be good if you use first replaceAll function to replace all the multiple spaces by a single space and then do tokenization using split function.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文