在Java中同时迭代两个非常大的文件

发布于 2024-12-07 19:10:19 字数 122 浏览 1 评论 0原文

我需要通过将第一个文件中的 4 行写入输出文件来合并两个非常大的文件(每个文件>1G),而不是从第二个文件中写入 4 行。依此类推直到最后。两个文件具有相同的行数,并且该行数可以被四整除。在 Java 中最有效的方法是什么?

I need to merge two very large files (>1G each) by writing 4 lines from the first into an output file, than write 4 from the second. And so on till the end. Both files have same number of lines and this number is divisible by four. What's the most efficient way to do it in Java?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

年华零落成诗 2024-12-14 19:10:19

我喜欢使用 Decorator 模式。创建两个类,每个类代表一个BufferedReader实例。

例如,

class First
{
   BufferedReader br;
   ...
   public String getLine()
    {
       return br.readLine();
     }
}

class Second
{
  BufferedReader br;
  ...
  public String [] getLines()
   {
     //read four lines and return it
   }
}

I love to use Decorator pattern. Create two classes, each class represents a BufferedReader instance.

For instance,

class First
{
   BufferedReader br;
   ...
   public String getLine()
    {
       return br.readLine();
     }
}

class Second
{
  BufferedReader br;
  ...
  public String [] getLines()
   {
     //read four lines and return it
   }
}
娇俏 2024-12-14 19:10:19

试试这个,看看需要多长时间。相应地调整n。如果太慢尝试使用nio。

import java.io.*;
class Time {
    Time() {}
    long dt() {
        return System.currentTimeMillis() - t0;
    }
    final long t0 = System.currentTimeMillis();
}
public class Main {
    public static void create(File file1, File file2, int n) throws Exception {
        BufferedWriter bw1 = new BufferedWriter(new FileWriter(file1));
        write(bw1, n, "foo");
        bw1.close();
        BufferedWriter bw2 = new BufferedWriter(new FileWriter(file2));
        write(bw2, n, "bar");
        bw2.close();
    }
    private static void write(BufferedWriter bw, int n, String line) throws IOException {
        for (int i = 0; i < n; i++) {
            bw.write(line);
            bw.write(lineSeparator);
        }
    }
    private static void write4(BufferedReader br1, BufferedWriter bw, String line) throws IOException {
        bw.write(line);
        bw.write(lineSeparator);
        for (int i = 0; i < 3; i++) {
            line = br1.readLine();
            bw.write(line);
            bw.write(lineSeparator);
        }
    }
    public static void main(String[] args) throws Exception {
        File file1 = new File("file1");
        File file2 = new File("file2");
        if (!file1.exists()) {
            create(file1, file2, 10000000);
        }
        File file3 = new File("file3");
        Time time=new Time();
        BufferedReader br1 = new BufferedReader(new FileReader(file1));
        BufferedReader br2 = new BufferedReader(new FileReader(file2));
        BufferedWriter bw = new BufferedWriter(new FileWriter(file3));
        String line1, line2;
        while ((line1 = br1.readLine()) != null) {
            write4(br1, bw, line1);
            line2 = br2.readLine();
            write4(br2, bw, line2);
        }
        br1.close();
        br2.close();
        bw.close();
        System.out.println(time.dt()/1000.+" s.");
    }
    static final String lineSeparator = System.getProperty("line.separator");
}

try this and see how long it takes. adjust n accordingly. if it's too slow try using nio.

import java.io.*;
class Time {
    Time() {}
    long dt() {
        return System.currentTimeMillis() - t0;
    }
    final long t0 = System.currentTimeMillis();
}
public class Main {
    public static void create(File file1, File file2, int n) throws Exception {
        BufferedWriter bw1 = new BufferedWriter(new FileWriter(file1));
        write(bw1, n, "foo");
        bw1.close();
        BufferedWriter bw2 = new BufferedWriter(new FileWriter(file2));
        write(bw2, n, "bar");
        bw2.close();
    }
    private static void write(BufferedWriter bw, int n, String line) throws IOException {
        for (int i = 0; i < n; i++) {
            bw.write(line);
            bw.write(lineSeparator);
        }
    }
    private static void write4(BufferedReader br1, BufferedWriter bw, String line) throws IOException {
        bw.write(line);
        bw.write(lineSeparator);
        for (int i = 0; i < 3; i++) {
            line = br1.readLine();
            bw.write(line);
            bw.write(lineSeparator);
        }
    }
    public static void main(String[] args) throws Exception {
        File file1 = new File("file1");
        File file2 = new File("file2");
        if (!file1.exists()) {
            create(file1, file2, 10000000);
        }
        File file3 = new File("file3");
        Time time=new Time();
        BufferedReader br1 = new BufferedReader(new FileReader(file1));
        BufferedReader br2 = new BufferedReader(new FileReader(file2));
        BufferedWriter bw = new BufferedWriter(new FileWriter(file3));
        String line1, line2;
        while ((line1 = br1.readLine()) != null) {
            write4(br1, bw, line1);
            line2 = br2.readLine();
            write4(br2, bw, line2);
        }
        br1.close();
        br2.close();
        bw.close();
        System.out.println(time.dt()/1000.+" s.");
    }
    static final String lineSeparator = System.getProperty("line.separator");
}
記憶穿過時間隧道 2024-12-14 19:10:19

抱歉,如果这个想法已经给出了:)

我认为最快的方法是创建 3 个线程。 t1 & t2 将从两个输入文件中读取,每个线程都有自己的队列(这是线程安全的),并且该线程将读取 4 行并将它们放入队列中。 t3将是写入线程,它将交替地从两个队列中读取节点并将它们放入新的合并文件中。我认为这是一个很好的解决方案,因为它允许对所有三个文件进行并行 i\o(i\o 是这里的瓶颈..)以及线程之间的最小交互。

Sorry if this idea was already given :)

I think the fastest way to do this is by creating 3 threads. t1 & t2 will be reading from the two input files, each thread will have its own queue (which is thread safe) and the thread will read 4 lines and put them in its queue. t3 will be the writing thread, it will interchangeably read nodes from the two queues and put them in the new merged file. I think this is a good solution because it allows parallel i\o to all three files (and i\o is the bottle neck here..) and minimum interaction between the threads.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文