java.util.Scanner 在读取大文件时出现故障

发布于 2024-10-16 22:49:03 字数 501 浏览 9 评论 0原文

我编写了一个程序,使用扫描仪从日志文件中读取行并解析每一行以查找重要的内容。阅读日志文件的每一行很重要。我编写了以下代码来扫描每一行

Scanner s = new Scanner(new File("Large.log"));
while(s.hasNextLine())
{
    String line = s.nextLine();
    //do the processing of the log line
}

上面的代码的行为方式很奇怪。它会在随机行数后(大约 100 万行后)停止读取行。我修改了上面的代码来检查最后一行读取的内容,并使用 Notepad++ 检查了日志文件。在该特定行之后,文件中还剩下很多行。我在 while 循环结束后添加了另一个 System.out.println(s.hasNextLine()) ,并且它打印 false。

但是,如果我尝试使用 BufferedReader 执行上述操作,则程序可以正常工作。 Java中的util IO类有什么限制吗?

I wrote a program where I used a Scanner to read lines from log files and parse each line to find something important. It is important that I read every line of the log file. I wrote the following piece of code to scan each line

Scanner s = new Scanner(new File("Large.log"));
while(s.hasNextLine())
{
    String line = s.nextLine();
    //do the processing of the log line
}

The above code behaves in a weird manner. It stops reading lines after a random number of lines [around after 1 million lines]. I modified the above code to check the last line read and also checked the log file using Notepad++. There were a lot of lines remaining in the file after that particular line. I added another System.out.println(s.hasNextLine()) after the end of the while loop and it prints false.

However if I try to do the above using a BufferedReader the program works fine. Is there any limitation with the util IO classes in Java?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

川水往事 2024-10-23 22:49:04

这听起来像是特定 JVM 实现的大文件支持问题。很多标准文件 I/O 无法处理文件 > 是一个常见问题。 32 位操作系统上为 4 GB。通常有文件 API 的替代版本来显式支持大文件,但实现 JVM 的人员必须记住使用这些版本。出于好奇,您使用的是什么操作系统,是 64 位吗?

This sounds like a large file support issue with your particular JVM implementation. It is a common problem for a lot of standard file I/O to not work with files > 4 GB on 32-bit OSs. There are typically alternative versions of the file APIs to explicitly support large files, but the person implementing the JVM would have to remember to use those. Out of curiosity what OS are you using and is it 64-bit?

弥枳 2024-10-23 22:49:04

我刚刚将包含 50 个字符的字符串转储到临时文件中,重复该字符串 500 万次。当我尝试逐行读取文件时,扫描仪对我来说工作得很好。

我在您的情况下看到两个可能的问题:

  1. 您可能正在尝试读取一条超过扫描仪内部缓冲区大小的大行以读取行?
  2. 虽然不太可能,但我希望不同的进程/线程不会对同一文件进行并发修改。

I just dumped a string containing 50 characters to a temporary file, repeating the string 5 million times. And Scanner works fine for me when I try to read the file line by line.

I see two possible problems in your case :

  1. May be you are trying to read a huge line that passes Scanner's internal buffer size for reading a line ?
  2. Though unlikely, I hope there are no concurrent modifications to the same file by a different process/thread.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文