在 Java 中使用 FileChannels 连接大文件的方法是什么?

发布于 2024-11-08 13:00:53 字数 2497 浏览 0 评论 0原文

我想找出两种方法中哪一种更好,用于在 Java 中连接文本文件。如果有人有一些关于内核级别发生的事情的见解,可以解释这些写入 FileChannel 的方法之间的差异,我将不胜感激。

根据我从文档和其他 Stack Overflow 对话中了解到的情况, allocateDirect 直接在驱动器上分配空间,并且主要避免使用 RAM。我担心如果文件输入很大(例如 1GB),使用 allocateDirect 创建的 ByteBuffer 可能会溢出或无法分配。在我们的软件开发过程中,我保证该文件不会大于 2 GB;但未来它可能会达到 10 或 20GB。

我观察到,transferFrom 循环从来不会多次执行循环...所以它似乎成功地一次写入整个 infile;但我还没有用大于 60MB 的文件对其进行测试。不过,我循环了,因为文档指定不能保证一次会写入多少内容。在我的系统上,transferFrom 只能接受 int32 作为其计数参数,因此我将无法指定一次传输超过 2GB 的数据...同样,内核专业知识将帮助我理解。

预先感谢您的帮助!

使用 ByteBuffer

boolean concatFiles(StringBuffer sb, File infile, File outfile) {

    FileChannel inChan = null, outChan = null;

    try {

        ByteBuffer buff = ByteBuffer.allocateDirect((int)(infile.length() + sb.length()));
        //write the stringBuffer so it goes in the output file first:
        buff.put(sb.toString().getBytes());

        //create the FileChannels:
        inChan  = new RandomAccessFile(infile,  "r" ).getChannel();
        outChan = new RandomAccessFile(outfile, "rw").getChannel();

        //read the infile in to the buffer:
        inChan.read(buff);

        // prep the buffer:
        buff.flip();

        // write the buffer out to the file via the FileChannel:
        outChan.write(buff);
        inChan.close();
        outChan.close();
     } catch...etc

}

使用 trasferTo(或 transferFrom)

boolean concatFiles(StringBuffer sb, File infile, File outfile) {

    FileChannel inChan = null, outChan = null;

    try {

        //write the stringBuffer so it goes in the output file first:    
        PrintWriter  fw = new PrintWriter(outfile);
        fw.write(sb.toString());
        fw.flush();
        fw.close();

        // create the channels appropriate for appending:
        outChan = new FileOutputStream(outfile, true).getChannel();
        inChan  = new RandomAccessFile(infile, "r").getChannel();

        long startSize = outfile.length();
        long inFileSize = infile.length();
        long bytesWritten = 0;

        //set the position where we should start appending the data:
        outChan.position(startSize);
        Byte startByte = outChan.position();

        while(bytesWritten < length){ 
            bytesWritten += outChan.transferFrom(inChan, startByte, (int) inFileSize);
            startByte = bytesWritten + 1;
        }

        inChan.close();
        outChan.close();
    } catch ... etc

I want to find out what method is better of two that I have come up with for concatenating my text files in Java. If someone has some insight they can share about what goes on at the kernel level that explains the difference between these methods of writing to a FileChannel, I would greatly appreciate it.

From what I understand from documentation and other Stack Overflow conversations, the allocateDirect allocates space right on the drive, and mostly avoids using RAM. I have a concern that the ByteBuffer created with allocateDirect might have a potential to overflow or not be allocated if the File infile is large, say 1GB. I am guaranteed at this point in the development of our software that the File will be no larger than 2 GB; but there is potential in the future that it might be as big as 10 or 20GB.

I have observed that the transferFrom loop never goes through the loop more than once... so it seems to succeed in writing the entire infile at once; but I haven't tested it with files bigger than 60MB. I looped though, because the documentation specifies that there is no guarantee of how much will be written at once. With transferFrom only able to accept, on my system, an int32 as its count parameter, I won't be able to specify more than 2GB at a time be transferred... Again, kernel expertise would help me understand.

Thanks in advance for your help!!

Using a ByteBuffer:

boolean concatFiles(StringBuffer sb, File infile, File outfile) {

    FileChannel inChan = null, outChan = null;

    try {

        ByteBuffer buff = ByteBuffer.allocateDirect((int)(infile.length() + sb.length()));
        //write the stringBuffer so it goes in the output file first:
        buff.put(sb.toString().getBytes());

        //create the FileChannels:
        inChan  = new RandomAccessFile(infile,  "r" ).getChannel();
        outChan = new RandomAccessFile(outfile, "rw").getChannel();

        //read the infile in to the buffer:
        inChan.read(buff);

        // prep the buffer:
        buff.flip();

        // write the buffer out to the file via the FileChannel:
        outChan.write(buff);
        inChan.close();
        outChan.close();
     } catch...etc

}

Using trasferTo (or transferFrom):

boolean concatFiles(StringBuffer sb, File infile, File outfile) {

    FileChannel inChan = null, outChan = null;

    try {

        //write the stringBuffer so it goes in the output file first:    
        PrintWriter  fw = new PrintWriter(outfile);
        fw.write(sb.toString());
        fw.flush();
        fw.close();

        // create the channels appropriate for appending:
        outChan = new FileOutputStream(outfile, true).getChannel();
        inChan  = new RandomAccessFile(infile, "r").getChannel();

        long startSize = outfile.length();
        long inFileSize = infile.length();
        long bytesWritten = 0;

        //set the position where we should start appending the data:
        outChan.position(startSize);
        Byte startByte = outChan.position();

        while(bytesWritten < length){ 
            bytesWritten += outChan.transferFrom(inChan, startByte, (int) inFileSize);
            startByte = bytesWritten + 1;
        }

        inChan.close();
        outChan.close();
    } catch ... etc

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

滥情哥ㄟ 2024-11-15 13:00:53

由于数据复制较少,transferTo() 的效率会高得多;如果数据复制都可以在内核中完成,则无需复制。如果它不在您的平台上,它仍然会使用高度调整的代码。

您确实需要循环,有一天它会迭代并且您的代码将继续工作。

transferTo() can be far more efficient as there is less data copying, or none if it can all be done in the kernel. And if it isn't on your platform it will still use highly tuned code.

You do need the loop, one day it will iterate and your code will keep working.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文