解析一个巨大的纯文本文件
我有一个巨大的文本文件(207 MB,400 万行),我需要逐行顺序读取它。
每行都有这样的格式:20227993821姓名 NINIC NN08
我正在使用(对于常规文件)Java 库的 FileReader
和 BufferedReader
,如下所示:
FileReader dataFile = new FileReader(directory);
data = new BufferedReader(dataFile);
String s;
while((s = data.readLine()) != null){
//do stuff
}
没有任何问题,但对于大文件,需要花费太多时间来处理。
我想知道在这种情况下最好的做法是什么(另一个库,不同的方法等),任何东西都会有帮助。
该文件由政府机构定期发布,必须加载到我的软件中进行数据比较。
编辑:
此代码:
BufferedReader data = new BufferedReader(new FileReader(file));
String s;
int count = 0;
while ((s = data.readLine()) != null) {
System.out.println (count + " - " + s);
count++;
}
data.close();
在 19 分 30 秒内执行。我不知道为什么花了这么长时间。
我有一个 64 位操作系统和一个 i5 处理器。
I have a huge text file (207 MB, 4 million lines) and I need to read it sequentially line by line.
Every line has this format:20227993821NAME AND SURNAME NINIC NN08
I was using (for regular files) the Java library's FileReader
and BufferedReader
like this:
FileReader dataFile = new FileReader(directory);
data = new BufferedReader(dataFile);
String s;
while((s = data.readLine()) != null){
//do stuff
}
with no problems, but with huge files it takes too much time to process.
I wonder what would be the best practice in such cases (another library, different methods, etc.), anything would be helpfull.
The file is issued periodically by a government agency and it must be loaded in to my software for data comparison.
Edit:
This code:
BufferedReader data = new BufferedReader(new FileReader(file));
String s;
int count = 0;
while ((s = data.readLine()) != null) {
System.out.println (count + " - " + s);
count++;
}
data.close();
executed in 19 minutes 30 seconds. I don't know why it took so long.
I have a 64 bit operative system and a i5 processor.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

如果我运行
它会打印
EDIT: 如果我打印出每一行,它会显着减慢,因为写入屏幕需要很长时间。我发现 MS-DOS 窗口特别慢。
我不认为读取文件花费了太长时间,而是您使用它所做的事情花费了很长时间。
If I run
it prints
EDIT: If I print out each line, it slows down dramatically because writing to the screen take a long time. I have found the MS-DOS window to be especially slow.
I don't believe its the reading of the file which is taking too long, it is what you are doing with it that is taking a long time.