缓冲大文件; BufferedInputStream 限制为 2GB; 数组限制为 2^31 字节

发布于 2024-07-05 12:23:59 字数 138 浏览 6 评论 0原文

我正在按顺序处理一个大文件,我想将其中的大部分保留在内存中,64 位系统上有 16GB 内存可用。

一个快速而肮脏的方法就是简单地将输入流包装到缓冲输入流中,不幸的是,这只给了我一个 2GB 的缓冲区。 我想记住更多的内容,我有什么选择?

I am sequentially processing a large file and I'd like to keep a large chunk of it in memory, 16gb ram available on a 64 bit system.

A quick and dirty way is to do this, is simply wrap the input stream into a buffered input stream, unfortunately, this only gives me a 2gb buffer. I'd like to have more of it in memory, what alternatives do I have?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

赢得她心 2024-07-12 12:23:59

让操作系统处理文件的缓冲怎么样? 您是否检查过不将整个文件复制到 JVM 内存中对性能的影响是什么?

编辑:然后,您可以使用 RandomAccessFile 或 FileChannel 有效地将文件的必要部分读入 JVM 内存。

How about letting the OS deal with the buffering of the file? Have you checked what the performance impact of not copying the whole file into JVMs memory is?

EDIT: You could then use either RandomAccessFile or the FileChannel to efficiently read the necessary parts of the file into the JVMs memory.

夜巴黎 2024-07-12 12:23:59

你考虑过java.nio中的MappedByteBuffer吗? 这超出了我的能力范围,但也许这就是您正在寻找的。

Have you considered the MappedByteBuffer in java.nio? It's over my head but maybe it is what you are looking for.

恍梦境° 2024-07-12 12:23:59

我怀疑一次缓冲超过 2GB 无论如何都会是一个巨大的胜利。 根据您正在进行的处理量,您的读入速度可能几乎与处理速度一样快。 为了加快速度,您可以尝试使用双线程生产者-消费者模型(一个线程读取文件并将数据交给另一个线程进行处理)。

I doubt that buffering more than 2gb at a time is going to be a huge win anyway. Depending on the amount of processing you're doing, you might be able to read in nearly as fast as you process. To speed it up, you might try using a two-threaded producer-consumer model (one thread reads the file and hands the data off to the other thread for processing).

最丧也最甜 2024-07-12 12:23:59

操作系统将尽可能多地缓存文件,因此尝试超越缓存管理器可能不会给您带来多大帮助。

从性能角度来看,将字节保留在 JVM 之外会更好(在操作系统和 JVM 之间传输大量数据相对较慢)。 您可以通过使用直接内存块支持的 MappedByteBuffer 来实现此目标。

这是相关的操作方法类型文章:

The OS is going to cache as much of the file as it can, so trying to outsmart the cache manager probably isn't going to get you very much.

From a performance perspective, you will be much better served by keeping the bytes outside the JVM (transferring huge chunks of data between the OS and JVM is relatively slow). You can achieve this goal by using a MappedByteBuffer backed by a direct memory block.

Here's a pertinent how-to type of article: article

又怨 2024-07-12 12:23:59

我认为 64 位 JVM 将支持非标准限制。

您可以尝试缓冲块。

I think there are 64 bit JVMs that will support nonstandard limits.

You might try buffering chunks.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文