为什么 gzip/deflate 压缩小文件会导致许多尾随零？

发布于 2024-09-13 18:39:14 字数 1291 浏览 4 评论 0原文

我使用以下代码在 C# 中压缩一个小 (~4kB) HTML 文件。

byte[] fileBuffer = ReadFully(inFile, ResponsePacket.maxResponsePayloadLength); // Read the entire requested HTML file into a memory buffer
inFile.Close();                                                                 // Close the requested HTML file

byte[] payload;
using (MemoryStream compMS = new MemoryStream())                                       // Create a new memory stream to hold the compressed HTML data
{
    using (GZipStream gzip = new GZipStream(compMS, CompressionMode.Compress))            // Create a new GZip object pointing to the empty memory stream
    {
        gzip.Write(fileBuffer, 0, fileBuffer.Length);                                   // Compress the file buffer and write it to the empty memory stream
        gzip.Close();                                                                   // Close the GZip object
    }
    payload = compMS.GetBuffer();                                            // Write the compressed file buffer data in the memory stream to a byte buffer
}

生成的压缩数据约为 2k，但其中大约一半只是零。这是针对带宽非常敏感的应用程序（这就是为什么我首先要费心压缩 4kB），因此额外的 1kB 零浪费了宝贵的空间。我最好的猜测是压缩算法将数据填充到块边界。如果是这样，有什么方法可以覆盖此行为或更改块大小？我使用 vanilla .NET GZipStream 和 zlib 的 GZipStream 以及 DeflateStream 得到了相同的结果。

原文

I'm using the following code to compress a small (~4kB) HTML file in C#.

byte[] fileBuffer = ReadFully(inFile, ResponsePacket.maxResponsePayloadLength); // Read the entire requested HTML file into a memory buffer
inFile.Close();                                                                 // Close the requested HTML file

byte[] payload;
using (MemoryStream compMS = new MemoryStream())                                       // Create a new memory stream to hold the compressed HTML data
{
    using (GZipStream gzip = new GZipStream(compMS, CompressionMode.Compress))            // Create a new GZip object pointing to the empty memory stream
    {
        gzip.Write(fileBuffer, 0, fileBuffer.Length);                                   // Compress the file buffer and write it to the empty memory stream
        gzip.Close();                                                                   // Close the GZip object
    }
    payload = compMS.GetBuffer();                                            // Write the compressed file buffer data in the memory stream to a byte buffer
}

The resulting compressed data is about 2k, but about half of it is just zeroes. This is for a very bandwidth sensitive application (which is why I'm bothering to compress 4kB in the first place), so the extra 1kB of zeroes is wasted valuable space. My best guess would be that the compression algorithm is padding out the data to a block boundary. If so, is there any way to override this behavior or change the block size? I get the same results with vanilla .NET GZipStream and zlib's GZipStream, as well as DeflateStream.

分享到QQ

分享到微博