Scanner vs. StringTokenizer vs. String.Split

发布于 2024-07-16 03:30:56 字数 175 浏览 12 评论 0原文

我刚刚了解了 Java 的 Scanner 类，现在我想知道它如何与 StringTokenizer 和 String.Split 进行比较/竞争。我知道 StringTokenizer 和 String.Split 只适用于字符串，那么为什么我要对字符串使用 Scanner 呢？ Scanner 只是为了提供一站式分割服务吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蓬勃野心 2024-07-23 03:30:56

他们本质上是课程的马。

Scanner 专为需要解析字符串、提取不同类型数据的情况而设计。它非常灵活，但可以说并没有为您提供最简单的 API 来简单地获取由特定表达式分隔的字符串数组。
String.split() 和 Pattern.split() 为您提供了执行后者的简单语法，但这基本上就是它们所做的全部。如果您想解析结果字符串，或者根据特定标记更改分隔符，它们将无法帮助您。
StringTokenizer 比 String.split() 更具限制性，而且使用起来也有点繁琐。它本质上是为了提取由固定子字符串分隔的标记而设计的。由于此限制，它的速度大约是 String.split() 的两倍。（请参阅我的 String.split() 和 的比较StringTokenizer。）它也早于正则表达式 API，String.split() 是正则表达式 API 的一部分。

从我的计时中您会注意到，String.split() 仍然可以在典型机器上在几毫秒内标记数千个字符串。此外，它比 StringTokenizer 还具有优势，它以字符串数组的形式提供输出，这通常是您想要的。大多数时候，使用由StringTokenizer 提供的Enumeration 过于“语法繁琐”。从这个角度来看，现在的StringTokenizer有点浪费空间，你还不如直接使用String.split()。

回复收藏 0 原文

尘世孤行 2024-07-23 03:30:56

让我们首先消除 StringTokenizer< /a>. 它已经过时了，甚至不支持正则表达式。其文档指出：

StringTokenizer 是一个遗留类，出于兼容性原因而保留，尽管不鼓励在新代码中使用它。建议任何寻求此功能的人使用 String 的 split 方法或 java.util.regex 包。

所以我们赶紧把它扔掉吧。剩下 split() 和 扫描仪。他们之间有什么区别？

一方面，split() 只是返回一个数组，这使得使用 foreach 循环变得很容易：

for (String token : input.split("\\s+") { ... }

Scanner 的构建更像是一个流：

while (myScanner.hasNext()) {
    String token = myScanner.next();
    ...
}

或者

while (myScanner.hasNextDouble()) {
    double token = myScanner.nextDouble();
    ...
}

（它有一个相当大型 API，所以不要认为它总是局限于如此简单的事情。）

当您在开始解析之前没有（或无法获取）所有输入时，这种流式界面对于解析简单的文本文件或控制台输入非常有用。

就我个人而言，我记得唯一一次使用 Scanner 是在学校项目中，当时我必须从命令行获取用户输入。它使此类操作变得容易。但是，如果我有一个想要拆分的 String，那么使用 split() 几乎是理所当然的事情。

Let's start by eliminating StringTokenizer. It is getting old and doesn't even support regular expressions. Its documentation states:

StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.

So let's throw it out right away. That leaves split() and Scanner. What's the difference between them?

For one thing, split() simply returns an array, which makes it easy to use a foreach loop:

for (String token : input.split("\\s+") { ... }

Scanner is built more like a stream:

while (myScanner.hasNext()) {
    String token = myScanner.next();
    ...
}

while (myScanner.hasNextDouble()) {
    double token = myScanner.nextDouble();
    ...
}

(It has a rather large API, so don't think that it's always restricted to such simple things.)

This stream-style interface can be useful for parsing simple text files or console input, when you don't have (or can't get) all the input before starting to parse.

Personally, the only time I can remember using Scanner is for school projects, when I had to get user input from the command line. It makes that sort of operation easy. But if I have a String that I want to split up, it's almost a no-brainer to go with split().

回复收藏 0 原文

心碎无痕… 2024-07-23 03:30:56

StringTokenizer 一直都在那里。它是最快的，但类似枚举的习惯用法可能看起来不如其他习惯用法那么优雅。

split 在 JDK 1.4 上出现。比 tokenizer 慢，但更易于使用，因为它可以从 String 类调用。

Scanner 出现在 JDK 1.5 上。它是最灵活的，填补了 Java API 上长期存在的空白，支持著名的 Cs scanf 函数系列的等效项。

回复收藏 0 原文

世界如花海般美丽 2024-07-23 03:30:56

Split 很慢，但没有 Scanner 慢。 StringTokenizer 比 split 更快。然而，我发现我可以通过牺牲一些灵活性来获得双倍的速度，以获得速度提升，这是我在 JFastParser https://github.com/hughperkins/jfastparser

对包含一百万个双精度数的字符串进行测试：

Scanner: 10642 ms
Split: 715 ms
StringTokenizer: 544ms
JFastParser: 290ms

Split is slow, but not as slow as Scanner. StringTokenizer is faster than split. However, I found that I could obtain double the speed, by trading some flexibility, to get a speed-boost, which I did at JFastParser https://github.com/hughperkins/jfastparser

Testing on a string containing one million doubles:

Scanner: 10642 ms
Split: 715 ms
StringTokenizer: 544ms
JFastParser: 290ms

回复收藏 0 原文

左耳近心 2024-07-23 03:30:56

如果您有一个想要标记的 String 对象，请优先使用 String 的 split StringTokenizer 方法。如果您要解析来自程序外部的源（例如文件或用户）的文本数据，那么扫描程序就可以派上用场。

回复收藏 0 原文

黑凤梨 2024-07-23 03:30:56

String.split似乎比StringTokenizer慢得多。拆分的唯一优点是您可以获得令牌数组。您还可以在 split 中使用任何正则表达式。
org.apache.commons.lang.StringUtils 有一个 split 方法，它的工作速度比任何两个可视化都要快得多。 StringTokenizer 或 String.split。
但三者的 CPU 利用率几乎相同。所以我们还需要一种 CPU 密集程度较低的方法，但我仍然找不到。

回复收藏 0 原文

独行侠 2024-07-23 03:30:56

我最近做了一些关于 String.split() 在性能高度敏感的情况下性能不佳的实验。您可能会发现这很有用。

Java 的 String.split() 和 Replace() 的隐藏弊端

要点是 String.split() 每次都会编译正则表达式模式，因此与使用预编译的 Pattern 对象相比，会减慢您的程序速度并直接使用它来操作字符串。

回复收藏 0 原文

街角迷惘 2024-07-23 03:30:56

对于默认场景，我也建议 Pattern.split() ，但如果您需要最大性能（特别是在 Android 上，我测试的所有解决方案都非常慢）并且您只需要按单个字符拆分，我现在使用我自己的方法：

public static ArrayList<String> splitBySingleChar(final char[] s,
        final char splitChar) {
    final ArrayList<String> result = new ArrayList<String>();
    final int length = s.length;
    int offset = 0;
    int count = 0;
    for (int i = 0; i < length; i++) {
        if (s[i] == splitChar) {
            if (count > 0) {
                result.add(new String(s, offset, count));
            }
            offset = i + 1;
            count = 0;
        } else {
            count++;
        }
    }
    if (count > 0) {
        result.add(new String(s, offset, count));
    }
    return result;
}

使用 "abc".toCharArray() 获取字符串的字符数组。例如：

String s = "     a bb   ccc  dddd eeeee  ffffff    ggggggg ";
ArrayList<String> result = splitBySingleChar(s.toCharArray(), ' ');

For the default scenarios I would suggest Pattern.split() as well but if you need maximum performance (especially on Android all solutions I tested are quite slow) and you only need to split by a single char, I now use my own method:

public static ArrayList<String> splitBySingleChar(final char[] s,
        final char splitChar) {
    final ArrayList<String> result = new ArrayList<String>();
    final int length = s.length;
    int offset = 0;
    int count = 0;
    for (int i = 0; i < length; i++) {
        if (s[i] == splitChar) {
            if (count > 0) {
                result.add(new String(s, offset, count));
            }
            offset = i + 1;
            count = 0;
        } else {
            count++;
        }
    }
    if (count > 0) {
        result.add(new String(s, offset, count));
    }
    return result;
}

Use "abc".toCharArray() to get the char array for a String. For example:

String s = "     a bb   ccc  dddd eeeee  ffffff    ggggggg ";
ArrayList<String> result = splitBySingleChar(s.toCharArray(), ' ');

回复收藏 0 原文

女中豪杰 2024-07-23 03:30:56

一个重要的区别是 String.split() 和 Scanner 都可以生成空字符串，但 StringTokenizer 永远不会这样做。

例如：

String str = "ab cd  ef";

StringTokenizer st = new StringTokenizer(str, " ");
for (int i = 0; st.hasMoreTokens(); i++) System.out.println("#" + i + ": " + st.nextToken());

String[] split = str.split(" ");
for (int i = 0; i < split.length; i++) System.out.println("#" + i + ": " + split[i]);

Scanner sc = new Scanner(str).useDelimiter(" ");
for (int i = 0; sc.hasNext(); i++) System.out.println("#" + i + ": " + sc.next());

输出：

//StringTokenizer
#0: ab
#1: cd
#2: ef
//String.split()
#0: ab
#1: cd
#2: 
#3: ef
//Scanner
#0: ab
#1: cd
#2: 
#3: ef

这是因为 String.split() 和 Scanner.useDelimiter() 的分隔符不仅仅是一个字符串，而是一个正则表达式。我们可以将上面示例中的分隔符“”替换为“+”，使它们的行为类似于 StringTokenizer。

One important difference is that both String.split() and Scanner can produce empty strings but StringTokenizer never does it.

For example:

String str = "ab cd  ef";

StringTokenizer st = new StringTokenizer(str, " ");
for (int i = 0; st.hasMoreTokens(); i++) System.out.println("#" + i + ": " + st.nextToken());

String[] split = str.split(" ");
for (int i = 0; i < split.length; i++) System.out.println("#" + i + ": " + split[i]);

Scanner sc = new Scanner(str).useDelimiter(" ");
for (int i = 0; sc.hasNext(); i++) System.out.println("#" + i + ": " + sc.next());

Output:

//StringTokenizer
#0: ab
#1: cd
#2: ef
//String.split()
#0: ab
#1: cd
#2: 
#3: ef
//Scanner
#0: ab
#1: cd
#2: 
#3: ef

This is because the delimiter for String.split() and Scanner.useDelimiter() is not just a string, but a regular expression. We can replace the delimiter " " with " +" in the example above to make them behave like StringTokenizer.

回复收藏 0 原文