当前位置：文江博客话题详情

Java 的 Scanner 与 String.split() 与 StringTokenizer；我应该使用哪个？

发布于 2024-07-16 15:48:31 字数 249 浏览 5 评论 0原文

我目前正在使用 split() 扫描文件，其中每行都有由 '~' 分隔的字符串数量。我在某处读到 Scanner 可以在性能方面更好地处理长文件，所以我考虑检查一下。

我的问题是：我是否必须创建两个 Scanner 实例？也就是说，一个读取一行，另一个基于该行获取分隔符的标记？如果我必须这样做，我怀疑我是否会从使用它中获得任何好处。也许我在这里遗漏了一些东西？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

够运 2024-07-23 15:48:31

在单线程模型中围绕这些进行了一些度量，这是我得到的结果。

~~~~~~~~~~~~~~~~~~Time Metrics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ Tokenizer  |   String.Split()   |    while+SubString  |    Scanner    |    ScannerWithCompiledPattern    ~
~   4.0 ms   |      5.1 ms        |        1.2 ms       |     0.5 ms    |                0.1 ms            ~
~   4.4 ms   |      4.8 ms        |        1.1 ms       |     0.1 ms    |                0.1 ms            ~
~   3.5 ms   |      4.7 ms        |        1.2 ms       |     0.1 ms    |                0.1 ms            ~
~   3.5 ms   |      4.7 ms        |        1.1 ms       |     0.1 ms    |                0.1 ms            ~
~   3.5 ms   |      4.7 ms        |        1.1 ms       |     0.1 ms    |                0.1 ms            ~
____________________________________________________________________________________________________________

结果是 Scanner 提供了最佳性能，现在同样需要在多线程模式下进行评估！我的一位学长说 Tokenizer 会导致 CPU 峰值，而 String.split 不会。

Did some metrics around these in a single threaded model and here are the results I got.

~~~~~~~~~~~~~~~~~~Time Metrics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ Tokenizer  |   String.Split()   |    while+SubString  |    Scanner    |    ScannerWithCompiledPattern    ~
~   4.0 ms   |      5.1 ms        |        1.2 ms       |     0.5 ms    |                0.1 ms            ~
~   4.4 ms   |      4.8 ms        |        1.1 ms       |     0.1 ms    |                0.1 ms            ~
~   3.5 ms   |      4.7 ms        |        1.2 ms       |     0.1 ms    |                0.1 ms            ~
~   3.5 ms   |      4.7 ms        |        1.1 ms       |     0.1 ms    |                0.1 ms            ~
~   3.5 ms   |      4.7 ms        |        1.1 ms       |     0.1 ms    |                0.1 ms            ~
____________________________________________________________________________________________________________

The out come is that Scanner gives the best performance, Now the same needs to be evaluated on a multithreaded mode ! One of my senior's say that the Tokenizer gives a CPU spike and String.split does not.

回复收藏 0 原文

凉城已无爱 2024-07-23 15:48:31

对于处理线，您可以使用扫描仪，对于从每条线获取令牌，您可以使用 split。

Scanner scanner = new Scanner(new File(loc));
try {
    while ( scanner.hasNextLine() ){
        String[] tokens = scanner.nextLine().split("~");
        // do the processing for tokens here
    }
}
finally {
    scanner.close();
}

For processing line you can use scanner and for getting tokens from each line you can use split.

Scanner scanner = new Scanner(new File(loc));
try {
    while ( scanner.hasNextLine() ){
        String[] tokens = scanner.nextLine().split("~");
        // do the processing for tokens here
    }
}
finally {
    scanner.close();
}

回复收藏 0 原文

貪欢 2024-07-23 15:48:31

您可以使用 useDelimiter("~") 方法让您使用 hasNext()/next() 迭代每行上的标记，同时仍使用 hasNextLine()/nextLine() 迭代行本身。

编辑：如果您要进行性能比较，则应在执行 split() 测试时预编译正则表达式：

Pattern splitRegex = Pattern.compile("~");
while ((line = bufferedReader.readLine()) != null)
{
  String[] tokens = splitRegex.split(line);
  // etc.
}

如果您使用 String#split(String regex)，则正则表达式将每次都要重新编译。（扫描程序在第一次编译所有正则表达式时会自动缓存它们。）如果您这样做，我预计性能不会有太大差异。

You can use the useDelimiter("~") method to let you iterate through the tokens on each line with hasNext()/next(), while still using hasNextLine()/nextLine() to iterate through the lines themselves.

EDIT: If you're going to do a performance comparison, you should pre-compile the regex when you do the split() test:

Pattern splitRegex = Pattern.compile("~");
while ((line = bufferedReader.readLine()) != null)
{
  String[] tokens = splitRegex.split(line);
  // etc.
}

If you use String#split(String regex), the regex will be recompiled every time. (Scanner automatically caches all regexes the first time it compiles them.) If you do that, I wouldn't expect to see much difference in performance.

回复收藏 0 原文