zlib 压缩字节数组？

发布于 2024-11-14 09:33:01 字数 1685 浏览 3 评论 0原文

我有这个未压缩的字节数组：

0E 7C BD 03 6E 65 67 6C 65 63 74 00 00 00 00 00 00 00 00 00 42 52 00 00 01 02 01
00 BB 14 8D 37 0A 00 00 01 00 00 00 00 05 E9 05 E9 00 00 00 00 00 00 00 00 00 00
00 00 00 00 01 00 00 00 00 00 81 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 05 00 00 01 00 00 00

并且我需要使用 deflate 算法（在 zlib 中实现）对其进行压缩，从我搜索到的 C# 中的等效算法将使用 GZipStream 但我根本无法匹配压缩结果。

这是压缩代码：

public byte[] compress(byte[] input)
{
    using (MemoryStream ms = new MemoryStream())
    {
        using (GZipStream deflateStream = new GZipStream(ms, CompressionMode.Compress))
        {
            deflateStream.Write(input, 0, input.Length);
        }
        return ms.ToArray();
    }
}

这是上述压缩代码的结果：

1F 8B 08 00 00 00 00 00 04 00 ED BD 07 60 1C 49 96 25 26 2F 6D CA 7B 7F 4A F5 4A
D7 E0 74 A1 08 80 60 13 24 D8 90 40 10 EC C1 88 CD E6 92 EC 1D 69 47 23 29 AB 2A
81 CA 65 56 65 5D 66 16 40 CC ED 9D BC F7 DE 7B EF BD F7 DE 7B EF BD F7 BA 3B 9D
4E 27 F7 DF FF 3F 5C 66 64 01 6C F6 CE 4A DA C9 9E 21 80 AA C8 1F 3F 7E 7C 1F 3F
22 7E 93 9F F9 FB 7F ED 65 7E 51 E6 D3 F6 D7 30 CF 93 57 BF C6 AF F1 6B FE 5A BF
E6 AF F1 F7 FE 56 7F FC 03 F3 D9 AF FB 5F DB AF 83 E7 0F FE 35 23 1F FE BA F4 FE
AF F1 6B FC 1A FF 0F 26 EC 38 82 5C 00 00 00

这是我期望的结果：

78 9C E3 AB D9 CB 9C 97 9A 9E 93 9A 5C C2 00 03 4E 41 0C 0C 8C 4C 8C 0C BB 45 7A
CD B9 80 4C 90 18 EB 4B D6 97 0C 28 00 2C CC D0 C8 C8 80 09 58 21 B2 00 65 6B 08
C8

我做错了什么，有人可以帮助我吗？

原文

I have this uncompressed byte array:

0E 7C BD 03 6E 65 67 6C 65 63 74 00 00 00 00 00 00 00 00 00 42 52 00 00 01 02 01
00 BB 14 8D 37 0A 00 00 01 00 00 00 00 05 E9 05 E9 00 00 00 00 00 00 00 00 00 00
00 00 00 00 01 00 00 00 00 00 81 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 05 00 00 01 00 00 00

And I need to compress it using the deflate algorithm (implemented in zlib), from what I searched the equivalent in C# would be using GZipStream but I can't match the compressed resulted at all.

Here is the compressing code:

public byte[] compress(byte[] input)
{
    using (MemoryStream ms = new MemoryStream())
    {
        using (GZipStream deflateStream = new GZipStream(ms, CompressionMode.Compress))
        {
            deflateStream.Write(input, 0, input.Length);
        }
        return ms.ToArray();
    }
}

Here is the result of the above compressing code:

1F 8B 08 00 00 00 00 00 04 00 ED BD 07 60 1C 49 96 25 26 2F 6D CA 7B 7F 4A F5 4A
D7 E0 74 A1 08 80 60 13 24 D8 90 40 10 EC C1 88 CD E6 92 EC 1D 69 47 23 29 AB 2A
81 CA 65 56 65 5D 66 16 40 CC ED 9D BC F7 DE 7B EF BD F7 DE 7B EF BD F7 BA 3B 9D
4E 27 F7 DF FF 3F 5C 66 64 01 6C F6 CE 4A DA C9 9E 21 80 AA C8 1F 3F 7E 7C 1F 3F
22 7E 93 9F F9 FB 7F ED 65 7E 51 E6 D3 F6 D7 30 CF 93 57 BF C6 AF F1 6B FE 5A BF
E6 AF F1 F7 FE 56 7F FC 03 F3 D9 AF FB 5F DB AF 83 E7 0F FE 35 23 1F FE BA F4 FE
AF F1 6B FC 1A FF 0F 26 EC 38 82 5C 00 00 00

Here is the result I am expecting:

78 9C E3 AB D9 CB 9C 97 9A 9E 93 9A 5C C2 00 03 4E 41 0C 0C 8C 4C 8C 0C BB 45 7A
CD B9 80 4C 90 18 EB 4B D6 97 0C 28 00 2C CC D0 C8 C8 80 09 58 21 B2 00 65 6B 08
C8

What I am doing wrong, could some one help me out there ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

泛泛之交 2024-11-21 09:33:01

首先，一些信息：DEFLATE是压缩算法，它在RFC 1951中定义。 DEFLATE 用于 ZLIB 和 GZIP 格式，定义于 RFC 1950 和 1952 分别，本质上很薄DEFLATE 字节流的包装器。包装器提供元数据，例如文件名、时间戳、CRC 或 Adler 等。

.NET 的基类库实现了 DeflateStream，当用于压缩时，它会生成原始 DEFLATE 字节流。当用于解压缩时，它会消耗原始 DEFLATE 字节流。 .NET 还提供了 GZipStream，它只是该基础的 GZIP 包装器。 .NET 基类库中没有 ZlibStream - 没有任何东西可以生成或使用 ZLIB。有一些技巧可以做，你可以搜索一下。

.NET 中的 deflate 逻辑表现出一种行为异常，即先前压缩的数据在“压缩”时实际上可能会显着膨胀。这是的来源Microsoft 提出的 Connect 错误，以及已在此处讨论过。这可能就是您所看到的，就无效压缩而言。微软已经拒绝了这个bug，因为虽然它对于节省空间无效，但压缩流并不是无效的，换句话说，它可以被任何兼容的DEFLATE引擎“解压缩”。

无论如何，正如其他人发布的那样，不同压缩器产生的压缩字节流可能不一定相同。这取决于它们的默认设置以及压缩器的应用程序指定设置。即使压缩的字节流不同，它们仍然可能解压缩为相同的原始字节流。另一方面，您用来压缩的东西是 GZIP，而您想要的似乎是 ZLIB。虽然它们是相关的，但它们并不相同；您不能使用 GZipStream 生成 ZLIB 字节流。这是您所看到的差异的主要来源。

我想你想要一个 ZLIB 流。

DotNetZip 项目中的免费托管 Zlib 实现了所有三种格式（DEFLATE、ZLIB、GZIP）的压缩流。 DeflateStream 和 GZipStream 的工作方式与 .NET 内置类相同，并且其中有一个 ZlibStream 类，它可以完成您认为它所做的事情。这些类都没有表现出我上面描述的行为异常。

在代码中它看起来像这样：

    byte[] original = new byte[] {
        0x0E, 0x7C, 0xBD, 0x03, 0x6E, 0x65, 0x67, 0x6C,
        0x65, 0x63, 0x74, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x42, 0x52, 0x00, 0x00,
        0x01, 0x02, 0x01, 0x00, 0xBB, 0x14, 0x8D, 0x37,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x05, 0xE9, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x81, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x05, 0x00, 0x00,
        0x01, 0x00, 0x00, 0x00
    };

    var compressed = Ionic.Zlib.ZlibStream.CompressBuffer(original);

输出是这样的：

0000    78 DA E3 AB D9 CB 9C 97 9A 9E 93 9A 5C C2 00 03     x...........\...
0010    4E 41 0C 0C 8C 4C 8C 0C BB 45 7A CD 61 62 AC 2F     NA...L...Ez.ab./
0020    19 B0 82 46 46 2C 82 AC 40 FD 40 0A 00 35 25 07     ...FF,..@[email protected]%.
0030    CE                                                  .

要解压缩，

    var uncompressed = Ionic.Zlib.ZlibStream.UncompressBuffer(compressed);

您可以看到静态 CompressBuffer 方法的文档。

编辑

提出了问题，为什么DotNetZip为前两个字节生成78 DA而不是78 9C？差异并不重要。 78 DA 编码“最大压缩”，而 78 9C 编码“默认压缩”。正如您在数据中看到的，对于这个小样本，无论使用 BEST 还是 DEFAULT，实际压缩字节都是完全相同的。此外，在解压缩期间不使用压缩级别信息。它对您的应用程序没有影响。

如果您不想要“最大”压缩，换句话说，如果您非常希望将 78 9C 作为前两个字节，即使这并不重要，那么您不能使用 < code>CompressBuffer 便利函数，它在幕后使用最佳压缩级别。相反，你可以这样做：

  var compress = new Func<byte[], byte[]>( a => {
        using (var ms = new System.IO.MemoryStream())
        {
            using (var compressor =
                   new Ionic.Zlib.ZlibStream( ms, 
                                              CompressionMode.Compress,
                                              CompressionLevel.Default )) 
            {
                compressor.Write(a,0,a.Length);
            }

            return ms.ToArray();
        }
    });

  var original = new byte[] { .... };
  var compressed = compress(original);

结果是：

0000    78 9C E3 AB D9 CB 9C 97 9A 9E 93 9A 5C C2 00 03     x...........\...
0010    4E 41 0C 0C 8C 4C 8C 0C BB 45 7A CD 61 62 AC 2F     NA...L...Ez.ab./
0020    19 B0 82 46 46 2C 82 AC 40 FD 40 0A 00 35 25 07     ...FF,..@[email protected]%.
0030    CE                                                  .

First, some information: DEFLATE is the compression algorithm, it is defined in RFC 1951. DEFLATE is used in the ZLIB and GZIP formats, defined in RFC 1950 and 1952 respectively, which essentially are thin wrappers around DEFLATE bytestreams. The wrappers provide metadata such as, the name of the file, timestamps, CRCs or Adlers, and so on.

.NET's base class library implements a DeflateStream that produces a raw DEFLATE bytestream, when used for compression. When used in decompression it consumes a raw DEFLATE bytestream. .NET also provides a GZipStream, which is just a GZIP wrapper around that base. There is no ZlibStream in the .NET base class library - nothing that produces or consumes ZLIB. There are some tricks to doing it, you can search around.

The deflate logic in .NET exhibits a behavioral anomaly, where previously compressed data can actually be inflated, significantly, when "compressed". This was the source of a Connect bug raised with Microsoft, and has been discussed here on SO. This may be what you are seeing, as far as ineffective compression. Microsoft have rejected the bug, because while it is ineffective for saving space, the compressed stream is not invalid, in other words it can be "decompressed" by any compliant DEFLATE engine.

In any case, as someone else posted, the compressed bytestream produced by different compressors may not necessarily be the same. It depends on their default settings, and the application-specified settings for the compressor. Even though the compressed bytestreams are different, they may still decompress to the same original bytestream. On the other hand the thing you used to compress was GZIP, while it appears what you want is ZLIB. While they are related, they are not the same; you cannot use GZipStream to produce a ZLIB bytestream. This is the primary source of the difference you see.

I think you want a ZLIB stream.

The free managed Zlib in the DotNetZip project implements compressing streams for all of the three formats (DEFLATE, ZLIB, GZIP). The DeflateStream and GZipStream work the same way as the .NET builtin classes, and there's a ZlibStream class in there, that does what you think it does. None of these classes exhibit the behavior anomaly I described above.

In code it looks like this:

    byte[] original = new byte[] {
        0x0E, 0x7C, 0xBD, 0x03, 0x6E, 0x65, 0x67, 0x6C,
        0x65, 0x63, 0x74, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x42, 0x52, 0x00, 0x00,
        0x01, 0x02, 0x01, 0x00, 0xBB, 0x14, 0x8D, 0x37,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x05, 0xE9, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x81, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x05, 0x00, 0x00,
        0x01, 0x00, 0x00, 0x00
    };

    var compressed = Ionic.Zlib.ZlibStream.CompressBuffer(original);

The output is like this:

0000    78 DA E3 AB D9 CB 9C 97 9A 9E 93 9A 5C C2 00 03     x...........\...
0010    4E 41 0C 0C 8C 4C 8C 0C BB 45 7A CD 61 62 AC 2F     NA...L...Ez.ab./
0020    19 B0 82 46 46 2C 82 AC 40 FD 40 0A 00 35 25 07     ...FF,..@[email protected]%.
0030    CE                                                  .

To decompress,

    var uncompressed = Ionic.Zlib.ZlibStream.UncompressBuffer(compressed);

You can see the documentation on the static CompressBuffer method.

EDIT

The question is raised, why is DotNetZip producing 78 DA for the first two bytes instead of 78 9C? The difference is immaterial. 78 DA encodes "max compression", while 78 9C encodes "default compression". As you can see in the data, for this small sample, the actual compressed bytes are exactly the same whether using BEST or DEFAULT. Also, the compression level information is not used during decompression. It has no effect in your application.

If you don't want "max" compression, in other words if you are very set on getting 78 9C as the first two bytes, even though it doesn't matter, then you cannot use the CompressBuffer convenience function, which uses the best compression level under the covers. Instead you can do this:

  var compress = new Func<byte[], byte[]>( a => {
        using (var ms = new System.IO.MemoryStream())
        {
            using (var compressor =
                   new Ionic.Zlib.ZlibStream( ms, 
                                              CompressionMode.Compress,
                                              CompressionLevel.Default )) 
            {
                compressor.Write(a,0,a.Length);
            }

            return ms.ToArray();
        }
    });

  var original = new byte[] { .... };
  var compressed = compress(original);

The result is:

0000    78 9C E3 AB D9 CB 9C 97 9A 9E 93 9A 5C C2 00 03     x...........\...
0010    4E 41 0C 0C 8C 4C 8C 0C BB 45 7A CD 61 62 AC 2F     NA...L...Ez.ab./
0020    19 B0 82 46 46 2C 82 AC 40 FD 40 0A 00 35 25 07     ...FF,..@[email protected]%.
0030    CE                                                  .

回复收藏 0 原文

蝶…霜飞 2024-11-21 09:33:01

很简单，您得到的内容有一个 GZip 标头。您想要的是更简单的 Zlib 标头。 ZLib 具有 GZip 标头、Zlib 标头或无标头的选项。通常使用 Zlib 标头，除非数据与磁盘文件关联（在这种情况下使用 GZip 标头）。显然，.Net 库无法编写 zlib 标头（尽管这是迄今为止最常见的）文件格式中使用的标头）。尝试 http://dotnetzip.codeplex.com/。

您可以使用 HexEdit（操作 -> 压缩 -> 设置）快速测试所有不同的 zlib 选项。请参阅 http://www.hexedit.com 。我花了 10 分钟检查你的数据，只需将你的压缩字节粘贴到 HexEdit 中并解压缩。还尝试使用 GZip 和 ZLib 标头压缩原始字节作为双重检查。请注意，您可能需要修改设置才能准确获得您期望的字节。

回复收藏 0 原文

~没有更多了~