在Java中同时迭代两个非常大的文件
我需要通过将第一个文件中的 4 行写入输出文件来合并两个非常大的文件(每个文件>1G),而不是从第二个文件中写入 4 行。依此类推直到最后。两个文件具有相同的行数,并且该行数可以被四整除。在 Java 中最有效的方法是什么?
I need to merge two very large files (>1G each) by writing 4 lines from the first into an output file, than write 4 from the second. And so on till the end. Both files have same number of lines and this number is divisible by four. What's the most efficient way to do it in Java?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我喜欢使用
Decorator
模式。创建两个类,每个类代表一个BufferedReader实例。例如,
I love to use
Decorator
pattern. Create two classes, each class represents a BufferedReader instance.For instance,
试试这个,看看需要多长时间。相应地调整n。如果太慢尝试使用nio。
try this and see how long it takes. adjust n accordingly. if it's too slow try using nio.
抱歉,如果这个想法已经给出了:)
我认为最快的方法是创建 3 个线程。 t1 & t2 将从两个输入文件中读取,每个线程都有自己的队列(这是线程安全的),并且该线程将读取 4 行并将它们放入队列中。 t3将是写入线程,它将交替地从两个队列中读取节点并将它们放入新的合并文件中。我认为这是一个很好的解决方案,因为它允许对所有三个文件进行并行 i\o(i\o 是这里的瓶颈..)以及线程之间的最小交互。
Sorry if this idea was already given :)
I think the fastest way to do this is by creating 3 threads. t1 & t2 will be reading from the two input files, each thread will have its own queue (which is thread safe) and the thread will read 4 lines and put them in its queue. t3 will be the writing thread, it will interchangeably read nodes from the two queues and put them in the new merged file. I think this is a good solution because it allows parallel i\o to all three files (and i\o is the bottle neck here..) and minimum interaction between the threads.