我应该按什么顺序使用 GzipOutputStream 和 BufferedOutputStream

发布于 2024-07-26 07:33:59 字数 249 浏览 5 评论 0原文

谁能建议我是否应该做类似的事情:

os = new GzipOutputStream(new BufferedOutputStream(...));

或者

os = new BufferedOutputStream(new GzipOutputStream(...));

哪个更有效? 我应该使用 BufferedOutputStream 吗?

Can anyone recommend whether I should do something like:

os = new GzipOutputStream(new BufferedOutputStream(...));

or

os = new BufferedOutputStream(new GzipOutputStream(...));

Which is more efficient? Should I use BufferedOutputStream at all?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

再见回来 2024-08-02 07:33:59

GZIPOutputStream 已经出现带有内置缓冲器。 因此,无需将 BufferedOutputStream 放置在链中紧邻的位置。 gojomo 的出色答案已经提供了一些有关缓冲区放置位置的指导。

GZIPOutputStream 的默认缓冲区大小仅为 512 字节,因此您需要通过构造函数参数将其增加到 8K 甚至 64K。 BufferedOutputStream 的默认缓冲区大小为 8K,这就是为什么您可以在组合默认 GZIPOutputStream 和 BufferedOutputStream 时衡量优势。 通过正确调整 GZIPOutputStream 的内置缓冲区的大小也可以实现该优势。

因此,回答您的问题:“我应该使用 BufferedOutputStream 吗?” → 不,在您的情况下,您不应该使用它,而是将 GZIPOutputStream 的缓冲区设置为至少 8K。

GZIPOutputStream already comes with a built-in buffer. So, there is no need to put a BufferedOutputStream right next to it in the chain. gojomo's excellent answer already provides some guidance on where to place the buffer.

The default buffer size for GZIPOutputStream is only 512 bytes, so you will want to increase it to 8K or even 64K via the constructor parameter. The default buffer size for BufferedOutputStream is 8K, which is why you can measure an advantage when combining the default GZIPOutputStream and BufferedOutputStream. That advantage can also be achieved by properly sizing the GZIPOutputStream's built-in buffer.

So, to answer your question: "Should I use BufferedOutputStream at all?" → No, in your case, you should not use it, but instead set the GZIPOutputStream's buffer to at least 8K.

仅此而已 2024-08-02 07:33:59

我应该按什么顺序使用GzipOutputStreamBufferedOutputStream

对于对象流,我发现将缓冲流包装在 gzip 流周围以进行输入和输出几乎总是显着< /em> 更快。 物体越小,效果就越好。 在所有情况下都比没有缓冲流更好或相同。

ois = new ObjectInputStream(new BufferedInputStream(new GZIPInputStream(fis)));
oos = new ObjectOutputStream(new BufferedOutputStream(new GZIPOutputStream(fos)));

但是,对于文本和直接字节流,我发现这是一个折腾 - 缓冲流周围的 gzip 流仅稍微好一点。 但在所有情况下都比没有缓冲流更好。

reader = new InputStreamReader(new GZIPInputStream(new BufferedInputStream(fis)));
writer = new OutputStreamWriter(new GZIPOutputStream(new BufferedOutputStream(fos)));

我运行了每个版本 20 次,并中断第一次运行并对其余的运行进行平均。 我还尝试了 buffered-gzip-buffered ,它对于对象稍好一些,对于文本则较差。 我根本没有考虑缓冲区大小。


对于对象流,我测试了 2 个数十兆字节的序列化对象文件。 对于较大的文件 (38mb),读取速度提高了 85%(0.7 秒与 5.6 秒),但写入速度实际上稍慢(5.9 秒与 5.7 秒)。 这些对象中有一些大型数组,这可能意味着更大的写入。

method       crc     date  time    compressed    uncompressed  ratio
defla   eb338650   May 19 16:59      14027543        38366001  63.4%

对于较小的文件 (18mb),读取速度提高了 75%(1.6 秒与 6.1 秒),写入速度提高了 40%(2.8 秒与 4.7 秒)。 它包含大量小物体。

method       crc     date  time    compressed    uncompressed  ratio
defla   92c9d529   May 19 16:56       6676006        17890857  62.7%

对于文本读取器/写入器,我使用了 64mb csv 文本文件。 缓冲流周围的 gzip 流读取速度快了 11%(950 毫秒与 1070 毫秒),写入速度稍快(7.9 秒与 8.1 秒)。

method       crc     date  time    compressed    uncompressed  ratio
defla   c6b72e34   May 20 09:16      22560860        63465800  64.5%

What order should I use GzipOutputStream and BufferedOutputStream

For object streams, I found that wrapping the buffered stream around the gzip stream for both input and output was almost always significantly faster. The smaller the objects, the better this did. Better or the same in all cases then no buffered stream.

ois = new ObjectInputStream(new BufferedInputStream(new GZIPInputStream(fis)));
oos = new ObjectOutputStream(new BufferedOutputStream(new GZIPOutputStream(fos)));

However, for text and straight byte streams, I found that it was a toss up -- with the gzip stream around the buffered stream being only slightly better. But better in all cases then no buffered stream.

reader = new InputStreamReader(new GZIPInputStream(new BufferedInputStream(fis)));
writer = new OutputStreamWriter(new GZIPOutputStream(new BufferedOutputStream(fos)));

I ran each version 20 times and cut off the first run and averaged the rest. I also tried buffered-gzip-buffered which was slightly better for objects and worse for text. I did not play with buffer sizes at all.


For the object streams, I tested 2 serialized object files in the 10s of megabytes. For the larger file (38mb), it was 85% faster on reading (0.7 versus 5.6 seconds) but actually slightly slower for writing (5.9 versus 5.7 seconds). These objects had some large arrays in them which may have meant larger writes.

method       crc     date  time    compressed    uncompressed  ratio
defla   eb338650   May 19 16:59      14027543        38366001  63.4%

For the smaller file (18mb), it was 75% faster for reading (1.6 versus 6.1 seconds) and 40% faster for writing (2.8 versus 4.7 seconds). It contained a large number of small objects.

method       crc     date  time    compressed    uncompressed  ratio
defla   92c9d529   May 19 16:56       6676006        17890857  62.7%

For the text reader/writer I used a 64mb csv text file. The gzip stream around the buffered stream was 11% faster for reading (950 versus 1070 milliseconds) and slightly faster when writing (7.9 versus 8.1 seconds).

method       crc     date  time    compressed    uncompressed  ratio
defla   c6b72e34   May 20 09:16      22560860        63465800  64.5%
送舟行 2024-08-02 07:33:59

当数据的最终目的地最好以比代码推送它更大的块读取/写入时,缓冲会有所帮助。 因此,您通常希望缓冲尽可能靠近需要更大块的地方。 在您的示例中,这是省略的“...”,因此用 GzipOutputStream 包装 BufferedOutputStream。 并且,调整 BufferedOutputStream 缓冲区大小以匹配测试显示的最适合目标的大小。

我怀疑外部的 BufferedOutputStream 是否会比没有显式缓冲有帮助(如果有的话)。 为什么不? 无论外部缓冲是否存在,GzipOutputStream 都会以相同大小的块对“...”执行 write() 操作。 所以不可能对“...”进行优化; 您对 GzipOutputStream write() 的大小感到困惑。

另请注意,通过缓冲压缩数据而不是未压缩数据,您可以更有效地使用内存。 如果您的数据经常实现 6 倍压缩,则“内部”缓冲区相当于“外部”缓冲区 6 倍大。

The buffering helps when the ultimate destination of the data is best read/written in larger chunks than your code would otherwise push it. So you generally want the buffering to be as close to the place-that-wants-larger-chunks. In your examples, that's the elided "...", so wrap the BufferedOutputStream with the GzipOutputStream. And, tune the BufferedOutputStream buffer size to match what testing shows works best with the destination.

I doubt the BufferedOutputStream on the outside would help much, if at all, over no explicit buffering. Why not? The GzipOutputStream will do its write()s to "..." in the same-sized chunks whether the outside buffering is present or not. So there's no optimizing for "..." possible; you're stuck with what sizes GzipOutputStream write()s.

Note also that you're using memory more efficiently by buffering the compressed data rather than the uncompressed data. If your data often acheives 6X compression, the 'inside' buffer is equivalent to an 'outside' buffer 6X as big.

半世晨晓 2024-08-02 07:33:59

通常,您需要一个靠近 FileOutputStream 的缓冲区(假设这就是 ... 代表的内容),以避免对操作系统进行过多调用和频繁的磁盘访问。 但是,如果您要向 GZIPOutputStream 写入大量小块,您也可能会受益于 GZIPOS 周围的缓冲区。 原因是 GZIPOS 中的 write 方法是同步的,并且还导致很少的其他同步调用和几个本机 (JNI) 调用(以更新 CRC32 并进行实际压缩)。 这些都会增加每次调用的额外开销。 因此,在这种情况下,我想说您将从这两个缓冲区中受益。

Normally you want a buffer close to your FileOutputStream (assuming that's what ... represents) to avoid too many calls into the OS and frequent disk access. However, if you're writing a lot of small chunks to the GZIPOutputStream you might benefit from a buffer around GZIPOS as well. The reason being the write method in GZIPOS is synchronized and also leads to few other synchronized calls and a couple of native (JNI) calls (to update the CRC32 and do the actual compression). These all add extra overhead per call. So in that case I'd say you'll benefit from both buffers.

倾城°AllureLove 2024-08-02 07:33:59

我建议您尝试一个简单的基准测试来计算压缩大文件所需的时间,看看它是否有很大的不同。 GzipOutputStream 确实有缓冲,但它是一个较小的缓冲区。 我会使用 64K 缓冲区执行第一个操作,但您可能会发现两者都执行更好。

I suggest you try a simple benchmark to time how long it take to compress a large file and see if it makes much difference. GzipOutputStream does have buffering but it is a smaller buffer. I would do the first with a 64K buffer, but you might find that doing both is better.

り繁华旳梦境 2024-08-02 07:33:59

阅读 javadoc,您会发现 BIS 用于缓冲从某些原始源读取的字节。 获得原始字节后,您需要对其进行压缩,以便用 GIS 包装 BIS。 缓冲 GZIP 的输出是没有意义的,因为我们需要考虑缓冲 GZIP 怎么样,谁会这样做?

new GzipInputStream( new BufferedInputStream ( new FileInputXXX

Read the javadoc, and you will discover that BIS is used to buffer bytes read from some original source. Once you get the raw bytes you want to compress them so you wrap BIS with a GIS. It makes no sense to buffer the output from a GZIP, because one needs to think what about buffering GZIP, who is going to do that ?

new GzipInputStream( new BufferedInputStream ( new FileInputXXX
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文