获取此 GZIPInputStream 的未压缩大小?
我有一个从另一个 ByteArrayInputStream
构造的 GZIPInputStream
。我想知道 gzip 数据的原始(未压缩)长度。虽然我可以读到GZIPInputStream
的末尾,然后统计数字,但这会花费很多时间并浪费CPU。在阅读之前我想知道尺寸。
对于 GZIPInputStream
是否有类似于 ZipEntry.getSize()
的方法:
public long getSize()
自: API 级别 1
获取此 ZipEntry 的未压缩大小。
I have a GZIPInputStream
that I constructed from another ByteArrayInputStream
. I want to know the original (uncompressed) length for the gzip data. Although I can read to the end of the GZIPInputStream
, then count the number, it will cost much time and waste CPU. I would like to know the size before read it.
Is there a similiar method like ZipEntry.getSize()
for GZIPInputStream
:
public long getSize ()
Since: API Level 1
Gets the uncompressed size of this ZipEntry.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
可以通过读取 gzip 压缩文件的最后四个字节来确定未压缩的大小。
我在这里找到了这个解决方案:
http://www.abeel.be/content /确定-未压缩的大小-gzip-file
此外,此链接还有一些示例代码(已更正为使用
long
而不是int
,以应对大小2GB 到 4GB 之间进行int
环绕):val
是以字节为单位的长度。请注意:当未压缩文件大于 4GB 时,您无法确定正确的未压缩大小!It is possible to determine the uncompressed size by reading the last four bytes of the gzipped file.
I found this solution here:
http://www.abeel.be/content/determine-uncompressed-size-gzip-file
Also from this link there is some example code (corrected to use
long
instead ofint
, to cope with sizes between 2GB and 4GB which would make anint
wrap around):val
is the length in bytes. Beware: you can not determine the correct uncompressed size, when the uncompressed file was greater than 4GB!基于@Alexander 的回答:
Based on @Alexander's answer:
否。它不在 Javadoc =>它不存在。
您需要这个长度做什么?
No. It's not in the Javadoc => it doesn't exist.
What do you need the length for?
除了解压缩整个东西之外,没有可靠的方法来获取长度。请参阅 使用 zlib 的 gzip 文件访问功能的未压缩文件大小。
There is no reliable way to get the length other than decompressing the whole thing. See Uncompressed file size using zlib's gzip file access function .
如果您可以猜测压缩率(如果数据与您已经处理的其他数据相似,则这是一个合理的期望),那么您可以计算出任意大文件的大小(有一些错误)。同样,这假设一个文件包含单个 gzip 流。以下假设第一个大于估计大小(基于估计比率)90% 的大小是真实大小:
[将 estCompRatio 设置为 0 相当于@Alexander 的答案]
If you can guess at the compression ratio (a reasonable expectation if the data is similar to other data you've already processed), then you can work out the size of arbitrarily large files (with some error). Again, this assumes a file containing a single gzip stream. The following assumes the first size greater than 90% of the estimated size (based on estimated ratio) is the true size:
[setting estCompRatio to 0 is equivalent to @Alexander's answer]
基于 4 个尾部字节的更紧凑版本的计算(避免使用字节缓冲区,调用 Integer.reverseBytes 来反转读取字节的字节顺序)。
A more compact version of the calculation based on the 4 tail bytes (avoids using a byte buffer, calls
Integer.reverseBytes
to reverse the byte order of read bytes).相反,从底层 FileInputStream 获取 FileChannel。它告诉您压缩文件的文件大小和当前位置。例子:
Get the FileChannel from the underlying FileInputStream instead. It tells you both file size and current position of the compressed file. Example:
不,不幸的是,如果您想获得未压缩的大小,您必须读取整个流并增加一个计数器,就像您在问题中提到的那样。为什么需要知道尺寸?尺寸的估计是否适合您的目的?
No, unfortunately if you wanted to get the uncompressed size, you would have to read the entire stream and increment a counter like you mention in your question. Why do you need to know the size? Could an estimation of the size work for your purposes?