增量读取大文件的最快方法

发布于 2024-12-29 06:48:13 字数 1331 浏览 3 评论 0原文

当给定 MAX_BUFFER_SIZE 的缓冲区和远远超过它的文件时，如何：

以 MAX_BUFFER_SIZE 的块读取文件？
尽可能快地完成

我尝试使用 NIO

    RandomAccessFile aFile = new RandomAccessFile(fileName, "r");
    FileChannel inChannel = aFile.getChannel();

    ByteBuffer buffer = ByteBuffer.allocate(CAPARICY);

    int bytesRead = inChannel.read(buffer);

    buffer.flip();

        while (buffer.hasRemaining()) {
            buffer.get();
        }

        buffer.clear();
        bytesRead = inChannel.read(buffer);

    aFile.close();

和常规 IO

    InputStream in = new FileInputStream(fileName);

    long length = fileName.length();

    if (length > Integer.MAX_VALUE) {
        throw new IOException("File is too large!");
    }

    byte[] bytes = new byte[(int) length];

    int offset = 0;

    int numRead = 0;

    while (offset < bytes.length
            && (numRead = in.read(bytes, offset, bytes.length - offset)) >= 0) {
        offset += numRead;
    }

    if (offset < bytes.length) {
        throw new IOException("Could not completely read file " + fileName);
    }

    in.close();

事实证明，常规 IO 在做同样的事情时比 NIO 快大约 100 倍。我错过了什么吗？这是预期的吗？有没有更快的方法来读取缓冲区块中的文件？

最终我正在处理一个大文件，我没有内存一次读取全部文件。相反，我想以块的形式增量读取它，然后用于处理。

原文

When given a buffer of MAX_BUFFER_SIZE, and a file that far exceeds it, how can one:

Read the file in blocks of MAX_BUFFER_SIZE?
Do it as fast as possible

I tried using NIO

    RandomAccessFile aFile = new RandomAccessFile(fileName, "r");
    FileChannel inChannel = aFile.getChannel();

    ByteBuffer buffer = ByteBuffer.allocate(CAPARICY);

    int bytesRead = inChannel.read(buffer);

    buffer.flip();

        while (buffer.hasRemaining()) {
            buffer.get();
        }

        buffer.clear();
        bytesRead = inChannel.read(buffer);

    aFile.close();

And regular IO

    InputStream in = new FileInputStream(fileName);

    long length = fileName.length();

    if (length > Integer.MAX_VALUE) {
        throw new IOException("File is too large!");
    }

    byte[] bytes = new byte[(int) length];

    int offset = 0;

    int numRead = 0;

    while (offset < bytes.length
            && (numRead = in.read(bytes, offset, bytes.length - offset)) >= 0) {
        offset += numRead;
    }

    if (offset < bytes.length) {
        throw new IOException("Could not completely read file " + fileName);
    }

    in.close();

Turns out that regular IO is about 100 times faster in doing the same thing as NIO. Am i missing something? Is this expected? Is there a faster way to read the file in buffer chunks?

Ultimately i am working with a large file i don't have memory for to read it all at once. Instead, I'd like to read it incrementally in blocks that would then be used for processing.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

雄赳赳气昂昂 2025-01-05 06:48:13

如果您想让您的第一个示例更快

FileChannel inChannel = new FileInputStream(fileName).getChannel();
ByteBuffer buffer = ByteBuffer.allocateDirect(CAPACITY);

while(inChannel.read(buffer) > 0)
    buffer.clear(); // do something with the data and clear/compact it.

inChannel.close();

如果您希望它更快。

FileChannel inChannel = new RandomAccessFile(fileName, "r").getChannel();
MappedByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
// access the buffer as you wish.
inChannel.close();

对于最大 2 GB 的文件，这可能需要 10 - 20 微秒。

If you want to make your first example faster

FileChannel inChannel = new FileInputStream(fileName).getChannel();
ByteBuffer buffer = ByteBuffer.allocateDirect(CAPACITY);

while(inChannel.read(buffer) > 0)
    buffer.clear(); // do something with the data and clear/compact it.

inChannel.close();

If you want it to be even faster.

FileChannel inChannel = new RandomAccessFile(fileName, "r").getChannel();
MappedByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
// access the buffer as you wish.
inChannel.close();

This can take 10 - 20 micro-seconds for files up to 2 GB in size.

回复收藏 0 原文

半步萧音过轻尘 2025-01-05 06:48:13

假设您需要一次将整个文件读入内存（正如您当前所做的那样），那么读取较小的块或 NIO 都不会在这里为您提供帮助。

事实上，您可能最好读取更大的块 - 您的常规 IO 代码会自动为您执行此操作。

您的 NIO 代码当前速度较慢，因为您一次仅读取一个字节（使用 buffer.get();）。

如果您想以块的形式进行处理（例如，在流之间传输），这是一种不使用 NIO 的标准方法：

InputStream is = ...;
OutputStream os = ...;

byte buffer[] = new byte[1024];
int read;
while((read = is.read(buffer)) != -1){
    os.write(buffer, 0, read);
}

它使用仅 1 KB 的缓冲区大小，但可以传输无限量的数据。

（如果您用您实际希望在功能级别执行的操作的详细信息扩展您的答案，我可以进一步改进它以获得更好的答案。）

Assuming that you need to read the entire file into memory at once (as you're currently doing), neither reading smaller chunks nor NIO are going to help you here.

In fact, you'd probably be best reading larger chunks - which your regular IO code is automatically doing for you.

Your NIO code is currently slower, because you're only reading one byte at a time (using buffer.get();).

If you want to process in chunks - for example, transferring between streams - here is a standard way of doing it without NIO:

InputStream is = ...;
OutputStream os = ...;

byte buffer[] = new byte[1024];
int read;
while((read = is.read(buffer)) != -1){
    os.write(buffer, 0, read);
}

This uses a buffer size of only 1 KB, but can transfer an unlimited amount of data.

(If you extend your answer with details of what you're actually looking to do at a functional level, I could further improve this to a better answer.)

回复收藏 0 原文

~没有更多了~