java.util.Scanner 在读取大文件时出现故障
我编写了一个程序,使用扫描仪从日志文件中读取行并解析每一行以查找重要的内容。阅读日志文件的每一行很重要。我编写了以下代码来扫描每一行
Scanner s = new Scanner(new File("Large.log"));
while(s.hasNextLine())
{
String line = s.nextLine();
//do the processing of the log line
}
上面的代码的行为方式很奇怪。它会在随机行数后(大约 100 万行后)停止读取行。我修改了上面的代码来检查最后一行读取的内容,并使用 Notepad++ 检查了日志文件。在该特定行之后,文件中还剩下很多行。我在 while 循环结束后添加了另一个 System.out.println(s.hasNextLine()) ,并且它打印 false。
但是,如果我尝试使用 BufferedReader 执行上述操作,则程序可以正常工作。 Java中的util IO类有什么限制吗?
I wrote a program where I used a Scanner to read lines from log files and parse each line to find something important. It is important that I read every line of the log file. I wrote the following piece of code to scan each line
Scanner s = new Scanner(new File("Large.log"));
while(s.hasNextLine())
{
String line = s.nextLine();
//do the processing of the log line
}
The above code behaves in a weird manner. It stops reading lines after a random number of lines [around after 1 million lines]. I modified the above code to check the last line read and also checked the log file using Notepad++. There were a lot of lines remaining in the file after that particular line. I added another System.out.println(s.hasNextLine())
after the end of the while
loop and it prints false.
However if I try to do the above using a BufferedReader
the program works fine. Is there any limitation with the util IO classes in Java?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这听起来像是特定 JVM 实现的大文件支持问题。很多标准文件 I/O 无法处理文件 > 是一个常见问题。 32 位操作系统上为 4 GB。通常有文件 API 的替代版本来显式支持大文件,但实现 JVM 的人员必须记住使用这些版本。出于好奇,您使用的是什么操作系统,是 64 位吗?
This sounds like a large file support issue with your particular JVM implementation. It is a common problem for a lot of standard file I/O to not work with files > 4 GB on 32-bit OSs. There are typically alternative versions of the file APIs to explicitly support large files, but the person implementing the JVM would have to remember to use those. Out of curiosity what OS are you using and is it 64-bit?
我刚刚将包含 50 个字符的字符串转储到临时文件中,重复该字符串 500 万次。当我尝试逐行读取文件时,扫描仪对我来说工作得很好。
我在您的情况下看到两个可能的问题:
I just dumped a string containing 50 characters to a temporary file, repeating the string 5 million times. And Scanner works fine for me when I try to read the file line by line.
I see two possible problems in your case :