如何检查InputStream是否被Gzip压缩?
有没有办法检查InputStream是否已被gzip压缩? 这是代码:
public static InputStream decompressStream(InputStream input) {
try {
GZIPInputStream gs = new GZIPInputStream(input);
return gs;
} catch (IOException e) {
logger.info("Input stream not in the GZIP format, using standard format");
return input;
}
}
我尝试了这种方式,但它没有按预期工作 - 从流中读取的值无效。 编辑: 添加了我用来压缩数据的方法:
public static byte[] compress(byte[] content) {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
GZIPOutputStream gs = new GZIPOutputStream(baos);
gs.write(content);
gs.close();
} catch (IOException e) {
logger.error("Fatal error occured while compressing data");
throw new RuntimeException(e);
}
double ratio = (1.0f * content.length / baos.size());
if (ratio > 1) {
logger.info("Compression ratio equals " + ratio);
return baos.toByteArray();
}
logger.info("Compression not needed");
return content;
}
Is there any way to check if InputStream has been gzipped?
Here's the code:
public static InputStream decompressStream(InputStream input) {
try {
GZIPInputStream gs = new GZIPInputStream(input);
return gs;
} catch (IOException e) {
logger.info("Input stream not in the GZIP format, using standard format");
return input;
}
}
I tried this way but it doesn't work as expected - values read from the stream are invalid.
EDIT:
Added the method I use to compress data:
public static byte[] compress(byte[] content) {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
GZIPOutputStream gs = new GZIPOutputStream(baos);
gs.write(content);
gs.close();
} catch (IOException e) {
logger.error("Fatal error occured while compressing data");
throw new RuntimeException(e);
}
double ratio = (1.0f * content.length / baos.size());
if (ratio > 1) {
logger.info("Compression ratio equals " + ratio);
return baos.toByteArray();
}
logger.info("Compression not needed");
return content;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
它并不是万无一失的,但它可能是最简单的,并且不依赖于任何外部数据。与所有不错的格式一样,GZip 也以一个神奇的数字开头,可以快速检查该数字,而无需读取整个流。
(幻数来源:GZip 文件格式规范)
< strong>更新:我刚刚发现
GZipInputStream
中还有一个名为GZIP_MAGIC
的常量,它包含这个值,所以如果你真的< /strong> 想要的话,可以使用它的低两个字节。It's not foolproof but it's probably the easiest and doesn't rely on any external data. Like all decent formats, GZip too begins with a magic number which can be quickly checked without reading the entire stream.
(Source for the magic number: GZip file format specification)
Update: I've just dicovered that there is also a constant called
GZIP_MAGIC
inGZipInputStream
which contains this value, so if you really want to, you can use the lower two bytes of it.在这种情况下,您需要检查 HTTP
Content-Encoding
响应标头是否等于gzip
。这一切都在 HTTP 规范中明确指定。
更新:根据您压缩流源的方式:这个比率检查非常......疯狂。摆脱它。长度相同并不一定意味着字节相同。让它始终返回gzip压缩的流,这样您总是就可以期待gzip压缩的流,并且只需应用
GZIPInputStream
而无需进行令人讨厌的检查。In that case you need to check if HTTP
Content-Encoding
response header equals togzip
.This all is clearly specified in HTTP spec.
Update: as per the way how you compressed the source of the stream: this ratio check is pretty... insane. Get rid of it. The same length does not necessarily mean that the bytes are the same. Let it always return the gzipped stream so that you can always expect a gzipped stream and just apply
GZIPInputStream
without nasty checks.我发现这个有用的示例提供了
isCompressed()
的干净实现:我成功测试了它:
I found this useful example that provides a clean implementation of
isCompressed()
:I tested it with success:
我相信这是检查字节数组是否为 gzip 格式的最简单方法,它不依赖于任何 HTTP 实体或 mime 类型支持
I believe this is simpliest way to check whether a byte array is gzip formatted or not, it does not depend on any HTTP entity or mime type support
基于@biziclop 的答案 - 该版本使用 GZIP_MAGIC 标头,并且对于空或单字节数据流也是安全的。
Building on the answer by @biziclop - this version uses the GZIP_MAGIC header and additionally is safe for empty or single byte data streams.
这个函数在Java中运行得很好:
在scala中:
This function works perfectly well in Java:
In scala:
不完全是您所要求的,但如果您使用 HttpClient,则可能是另一种方法:
Not exactly what you are asking but could be an alternative approach if you are using HttpClient:
将原始流包装在 BufferedInputStream 中,然后将其包装在 GZipInputStream 中。
接下来尝试提取 ZipEntry。如果有效,则它是一个 zip 文件。然后,您可以在检查后在 BufferedInputStream 中使用“mark”和“reset”返回到流中的初始位置。
Wrap the original stream in a BufferedInputStream, then wrap that in a GZipInputStream.
Next try to extract a ZipEntry. If this works, it's a zip file. Then you can use "mark" and "reset" in the BufferedInputStream to return to the initial position in the stream, after your check.
SimpleMagic 是一个用于解析内容类型的 Java 库:
SimpleMagic is a Java library for resolving content types:
这是读取可以 gzip 压缩的文件的方法:
This is how to read a file that CAN BE gzipped: