如何检查InputStream是否被Gzip压缩?

发布于 2024-10-14 11:20:34 字数 1061 浏览 6 评论 0原文

有没有办法检查InputStream是否已被gzip压缩? 这是代码:

public static InputStream decompressStream(InputStream input) {
    try {
        GZIPInputStream gs = new GZIPInputStream(input);
        return gs;
    } catch (IOException e) {
        logger.info("Input stream not in the GZIP format, using standard format");
        return input;
    }
}

我尝试了这种方式,但它没有按预期工作 - 从流中读取的值无效。 编辑: 添加了我用来压缩数据的方法:

public static byte[] compress(byte[] content) {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    try {
        GZIPOutputStream gs = new GZIPOutputStream(baos);
        gs.write(content);
        gs.close();
    } catch (IOException e) {
        logger.error("Fatal error occured while compressing data");
        throw new RuntimeException(e);
    }
    double ratio = (1.0f * content.length / baos.size());
    if (ratio > 1) {
        logger.info("Compression ratio equals " + ratio);
        return baos.toByteArray();
    }
    logger.info("Compression not needed");
    return content;

}

Is there any way to check if InputStream has been gzipped?
Here's the code:

public static InputStream decompressStream(InputStream input) {
    try {
        GZIPInputStream gs = new GZIPInputStream(input);
        return gs;
    } catch (IOException e) {
        logger.info("Input stream not in the GZIP format, using standard format");
        return input;
    }
}

I tried this way but it doesn't work as expected - values read from the stream are invalid.
EDIT:
Added the method I use to compress data:

public static byte[] compress(byte[] content) {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    try {
        GZIPOutputStream gs = new GZIPOutputStream(baos);
        gs.write(content);
        gs.close();
    } catch (IOException e) {
        logger.error("Fatal error occured while compressing data");
        throw new RuntimeException(e);
    }
    double ratio = (1.0f * content.length / baos.size());
    if (ratio > 1) {
        logger.info("Compression ratio equals " + ratio);
        return baos.toByteArray();
    }
    logger.info("Compression not needed");
    return content;

}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

驱逐舰岛风号 2024-10-21 11:20:34

它并不是万无一失的,但它可能是最简单的,并且不依赖于任何外部数据。与所有不错的格式一样,GZip 也以一个神奇的数字开头,可以快速检查该数字,而无需读取整个流。

public static InputStream decompressStream(InputStream input) {
     PushbackInputStream pb = new PushbackInputStream( input, 2 ); //we need a pushbackstream to look ahead
     byte [] signature = new byte[2];
     int len = pb.read( signature ); //read the signature
     pb.unread( signature, 0, len ); //push back the signature to the stream
     if( signature[ 0 ] == (byte) 0x1f && signature[ 1 ] == (byte) 0x8b ) //check if matches standard gzip magic number
       return new GZIPInputStream( pb );
     else 
       return pb;
}

(幻数来源:GZip 文件格式规范

< strong>更新:我刚刚发现GZipInputStream中还有一个名为GZIP_MAGIC的常量,它包含这个值,所以如果你真的< /strong> 想要的话,可以使用它的低两个字节。

It's not foolproof but it's probably the easiest and doesn't rely on any external data. Like all decent formats, GZip too begins with a magic number which can be quickly checked without reading the entire stream.

public static InputStream decompressStream(InputStream input) {
     PushbackInputStream pb = new PushbackInputStream( input, 2 ); //we need a pushbackstream to look ahead
     byte [] signature = new byte[2];
     int len = pb.read( signature ); //read the signature
     pb.unread( signature, 0, len ); //push back the signature to the stream
     if( signature[ 0 ] == (byte) 0x1f && signature[ 1 ] == (byte) 0x8b ) //check if matches standard gzip magic number
       return new GZIPInputStream( pb );
     else 
       return pb;
}

(Source for the magic number: GZip file format specification)

Update: I've just dicovered that there is also a constant called GZIP_MAGIC in GZipInputStream which contains this value, so if you really want to, you can use the lower two bytes of it.

你与昨日 2024-10-21 11:20:34

InputStream 来自 HttpURLConnection#getInputStream()

在这种情况下,您需要检查 HTTP Content-Encoding 响应标头是否等于 gzip

URLConnection connection = url.openConnection();
InputStream input = connection.getInputStream();

if ("gzip".equals(connection.getContentEncoding())) {
    input = new GZIPInputStream(input);
}

// ...

这一切都在 HTTP 规范中明确指定。


更新:根据您压缩流源的方式:这个比率检查非常......疯狂。摆脱它。长度相同并不一定意味着字节相同。让它始终返回gzip压缩的流,这样您总是就可以期待gzip压缩的流,并且只需应用GZIPInputStream而无需进行令人讨厌的检查。

The InputStream comes from HttpURLConnection#getInputStream()

In that case you need to check if HTTP Content-Encoding response header equals to gzip.

URLConnection connection = url.openConnection();
InputStream input = connection.getInputStream();

if ("gzip".equals(connection.getContentEncoding())) {
    input = new GZIPInputStream(input);
}

// ...

This all is clearly specified in HTTP spec.


Update: as per the way how you compressed the source of the stream: this ratio check is pretty... insane. Get rid of it. The same length does not necessarily mean that the bytes are the same. Let it always return the gzipped stream so that you can always expect a gzipped stream and just apply GZIPInputStream without nasty checks.

时光沙漏 2024-10-21 11:20:34

我发现这个有用的示例提供了isCompressed() 的干净实现:

/*
 * Determines if a byte array is compressed. The java.util.zip GZip
 * implementation does not expose the GZip header so it is difficult to determine
 * if a string is compressed.
 * 
 * @param bytes an array of bytes
 * @return true if the array is compressed or false otherwise
 * @throws java.io.IOException if the byte array couldn't be read
 */
 public boolean isCompressed(byte[] bytes)
 {
      if ((bytes == null) || (bytes.length < 2))
      {
           return false;
      }
      else
      {
            return ((bytes[0] == (byte) (GZIPInputStream.GZIP_MAGIC)) && (bytes[1] == (byte) (GZIPInputStream.GZIP_MAGIC >> 8)));
      }
 }

我成功测试了它:

@Test
public void testIsCompressed() {
    assertFalse(util.isCompressed(originalBytes));
    assertTrue(util.isCompressed(compressed));
}

I found this useful example that provides a clean implementation of isCompressed():

/*
 * Determines if a byte array is compressed. The java.util.zip GZip
 * implementation does not expose the GZip header so it is difficult to determine
 * if a string is compressed.
 * 
 * @param bytes an array of bytes
 * @return true if the array is compressed or false otherwise
 * @throws java.io.IOException if the byte array couldn't be read
 */
 public boolean isCompressed(byte[] bytes)
 {
      if ((bytes == null) || (bytes.length < 2))
      {
           return false;
      }
      else
      {
            return ((bytes[0] == (byte) (GZIPInputStream.GZIP_MAGIC)) && (bytes[1] == (byte) (GZIPInputStream.GZIP_MAGIC >> 8)));
      }
 }

I tested it with success:

@Test
public void testIsCompressed() {
    assertFalse(util.isCompressed(originalBytes));
    assertTrue(util.isCompressed(compressed));
}
探春 2024-10-21 11:20:34

我相信这是检查字节数组是否为 gzip 格式的最简单方法,它不依赖于任何 HTTP 实体或 mime 类型支持

public static boolean isGzipStream(byte[] bytes) {
      int head = ((int) bytes[0] & 0xff) | ((bytes[1] << 8) & 0xff00);
      return (GZIPInputStream.GZIP_MAGIC == head);
}

I believe this is simpliest way to check whether a byte array is gzip formatted or not, it does not depend on any HTTP entity or mime type support

public static boolean isGzipStream(byte[] bytes) {
      int head = ((int) bytes[0] & 0xff) | ((bytes[1] << 8) & 0xff00);
      return (GZIPInputStream.GZIP_MAGIC == head);
}
海拔太高太耀眼 2024-10-21 11:20:34

基于@biziclop 的答案 - 该版本使用 GZIP_MAGIC 标头,并且对于空或单字节数据流也是安全的。

public static InputStream maybeDecompress(InputStream input) {
    final PushbackInputStream pb = new PushbackInputStream(input, 2);

    int header = pb.read();
    if(header == -1) {
        return pb;
    }

    int b = pb.read();
    if(b == -1) {
        pb.unread(header);
        return pb;
    }

    pb.unread(new byte[]{(byte)header, (byte)b});

    header = (b << 8) | header;

    if(header == GZIPInputStream.GZIP_MAGIC) {
        return new GZIPInputStream(pb);
    } else {
        return pb;
    }
}

Building on the answer by @biziclop - this version uses the GZIP_MAGIC header and additionally is safe for empty or single byte data streams.

public static InputStream maybeDecompress(InputStream input) {
    final PushbackInputStream pb = new PushbackInputStream(input, 2);

    int header = pb.read();
    if(header == -1) {
        return pb;
    }

    int b = pb.read();
    if(b == -1) {
        pb.unread(header);
        return pb;
    }

    pb.unread(new byte[]{(byte)header, (byte)b});

    header = (b << 8) | header;

    if(header == GZIPInputStream.GZIP_MAGIC) {
        return new GZIPInputStream(pb);
    } else {
        return pb;
    }
}
韵柒 2024-10-21 11:20:34

这个函数在Java中运行得很好:

public static boolean isGZipped(File f) {   
    val raf = new RandomAccessFile(file, "r")
    return GZIPInputStream.GZIP_MAGIC == (raf.read() & 0xff | ((raf.read() << 8) & 0xff00))
}

scala中:

def isGZip(file:File): Boolean = {
   int gzip = 0
   RandomAccessFile raf = new RandomAccessFile(f, "r")
   gzip = raf.read() & 0xff | ((raf.read() << 8) & 0xff00)
   raf.close()
   return gzip == GZIPInputStream.GZIP_MAGIC
}

This function works perfectly well in Java:

public static boolean isGZipped(File f) {   
    val raf = new RandomAccessFile(file, "r")
    return GZIPInputStream.GZIP_MAGIC == (raf.read() & 0xff | ((raf.read() << 8) & 0xff00))
}

In scala:

def isGZip(file:File): Boolean = {
   int gzip = 0
   RandomAccessFile raf = new RandomAccessFile(f, "r")
   gzip = raf.read() & 0xff | ((raf.read() << 8) & 0xff00)
   raf.close()
   return gzip == GZIPInputStream.GZIP_MAGIC
}
破晓 2024-10-21 11:20:34

不完全是您所要求的,但如果您使用 HttpClient,则可能是另一种方法:

private static InputStream getInputStream(HttpEntity entity) throws IOException {
  Header encoding = entity.getContentEncoding(); 
  if (encoding != null) {
     if (encoding.getValue().equals("gzip") || encoding.getValue().equals("zip") ||      encoding.getValue().equals("application/x-gzip-compressed")) {
        return new GZIPInputStream(entity.getContent());
     }
  }
  return entity.getContent();
}

Not exactly what you are asking but could be an alternative approach if you are using HttpClient:

private static InputStream getInputStream(HttpEntity entity) throws IOException {
  Header encoding = entity.getContentEncoding(); 
  if (encoding != null) {
     if (encoding.getValue().equals("gzip") || encoding.getValue().equals("zip") ||      encoding.getValue().equals("application/x-gzip-compressed")) {
        return new GZIPInputStream(entity.getContent());
     }
  }
  return entity.getContent();
}
爱的那么颓废 2024-10-21 11:20:34

将原始流包装在 BufferedInputStream 中,然后将其包装在 GZipInputStream 中。
接下来尝试提取 ZipEntry。如果有效,则它是一个 zip 文件。然后,您可以在检查后在 BufferedInputStream 中使用“mark”和“reset”返回到流中的初始位置。

Wrap the original stream in a BufferedInputStream, then wrap that in a GZipInputStream.
Next try to extract a ZipEntry. If this works, it's a zip file. Then you can use "mark" and "reset" in the BufferedInputStream to return to the initial position in the stream, after your check.

无人问我粥可暖 2024-10-21 11:20:34

SimpleMagic 是一个用于解析内容类型的 Java 库:

<!-- pom.xml -->
    <dependency>
        <groupId>com.j256.simplemagic</groupId>
        <artifactId>simplemagic</artifactId>
        <version>1.8</version>
    </dependency>

import com.j256.simplemagic.ContentInfo;
import com.j256.simplemagic.ContentInfoUtil;
import com.j256.simplemagic.ContentType;
// ...

public class SimpleMagicSmokeTest {

    private final static Logger log = LoggerFactory.getLogger(SimpleMagicSmokeTest.class);

    @Test
    public void smokeTestSimpleMagic() throws IOException {
        ContentInfoUtil util = new ContentInfoUtil();
        InputStream possibleGzipInputStream = getGzipInputStream();
        ContentInfo info = util.findMatch(possibleGzipInputStream);

        log.info( info.toString() );
        assertEquals( ContentType.GZIP, info.getContentType() );
    }

SimpleMagic is a Java library for resolving content types:

<!-- pom.xml -->
    <dependency>
        <groupId>com.j256.simplemagic</groupId>
        <artifactId>simplemagic</artifactId>
        <version>1.8</version>
    </dependency>

import com.j256.simplemagic.ContentInfo;
import com.j256.simplemagic.ContentInfoUtil;
import com.j256.simplemagic.ContentType;
// ...

public class SimpleMagicSmokeTest {

    private final static Logger log = LoggerFactory.getLogger(SimpleMagicSmokeTest.class);

    @Test
    public void smokeTestSimpleMagic() throws IOException {
        ContentInfoUtil util = new ContentInfoUtil();
        InputStream possibleGzipInputStream = getGzipInputStream();
        ContentInfo info = util.findMatch(possibleGzipInputStream);

        log.info( info.toString() );
        assertEquals( ContentType.GZIP, info.getContentType() );
    }
智商已欠费 2024-10-21 11:20:34

这是读取可以 gzip 压缩的文件的方法:

private void read(final File file)
        throws IOException {
    InputStream stream = null;
    try (final InputStream inputStream = new FileInputStream(file);
            final BufferedInputStream bInputStream = new BufferedInputStream(inputStream);) {
        bInputStream.mark(1024);
        try {
            stream = new GZIPInputStream(bInputStream);
        } catch (final ZipException e) {
            // not gzipped OR not supported zip format
            bInputStream.reset();
            stream = bInputStream;
        }
        // USE STREAM HERE
    } finally {
        if (stream != null) {
            stream.close();
        }
    }
}

This is how to read a file that CAN BE gzipped:

private void read(final File file)
        throws IOException {
    InputStream stream = null;
    try (final InputStream inputStream = new FileInputStream(file);
            final BufferedInputStream bInputStream = new BufferedInputStream(inputStream);) {
        bInputStream.mark(1024);
        try {
            stream = new GZIPInputStream(bInputStream);
        } catch (final ZipException e) {
            // not gzipped OR not supported zip format
            bInputStream.reset();
            stream = bInputStream;
        }
        // USE STREAM HERE
    } finally {
        if (stream != null) {
            stream.close();
        }
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文