为什么 gzip/deflate 压缩小文件会导致许多尾随零?
我使用以下代码在 C# 中压缩一个小 (~4kB) HTML 文件。
byte[] fileBuffer = ReadFully(inFile, ResponsePacket.maxResponsePayloadLength); // Read the entire requested HTML file into a memory buffer
inFile.Close(); // Close the requested HTML file
byte[] payload;
using (MemoryStream compMS = new MemoryStream()) // Create a new memory stream to hold the compressed HTML data
{
using (GZipStream gzip = new GZipStream(compMS, CompressionMode.Compress)) // Create a new GZip object pointing to the empty memory stream
{
gzip.Write(fileBuffer, 0, fileBuffer.Length); // Compress the file buffer and write it to the empty memory stream
gzip.Close(); // Close the GZip object
}
payload = compMS.GetBuffer(); // Write the compressed file buffer data in the memory stream to a byte buffer
}
生成的压缩数据约为 2k,但其中大约一半只是零。这是针对带宽非常敏感的应用程序(这就是为什么我首先要费心压缩 4kB),因此额外的 1kB 零浪费了宝贵的空间。我最好的猜测是压缩算法将数据填充到块边界。如果是这样,有什么方法可以覆盖此行为或更改块大小?我使用 vanilla .NET GZipStream 和 zlib 的 GZipStream 以及 DeflateStream 得到了相同的结果。
I'm using the following code to compress a small (~4kB) HTML file in C#.
byte[] fileBuffer = ReadFully(inFile, ResponsePacket.maxResponsePayloadLength); // Read the entire requested HTML file into a memory buffer
inFile.Close(); // Close the requested HTML file
byte[] payload;
using (MemoryStream compMS = new MemoryStream()) // Create a new memory stream to hold the compressed HTML data
{
using (GZipStream gzip = new GZipStream(compMS, CompressionMode.Compress)) // Create a new GZip object pointing to the empty memory stream
{
gzip.Write(fileBuffer, 0, fileBuffer.Length); // Compress the file buffer and write it to the empty memory stream
gzip.Close(); // Close the GZip object
}
payload = compMS.GetBuffer(); // Write the compressed file buffer data in the memory stream to a byte buffer
}
The resulting compressed data is about 2k, but about half of it is just zeroes. This is for a very bandwidth sensitive application (which is why I'm bothering to compress 4kB in the first place), so the extra 1kB of zeroes is wasted valuable space. My best guess would be that the compression algorithm is padding out the data to a block boundary. If so, is there any way to override this behavior or change the block size? I get the same results with vanilla .NET GZipStream and zlib's GZipStream, as well as DeflateStream.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
MemoryStream 方法错误。 GetBuffer() 返回底层缓冲区,它始终比流中的数据大(或完全一样大)。非常有效,因为不需要复制。
但这里需要 ToArray() 方法。或者使用 Length 属性。
Wrong MemoryStream method. GetBuffer() returns the underlying buffer, it is always larger (or exactly as large) as the data in the stream. Very efficient because no copy needs to be made.
But you need the ToArray() method here. Or use the Length property.