GZipStream 有效性
我正在尝试将大 UInt16 数组保存到文件中。 positionCnt 约为 50000,stationCnt 约为 2500。直接保存,不使用 GZipStream,文件约为 250MB,可以通过外部 zip 程序压缩到 19MB。使用以下代码,文件大小为 507MB。我做错了什么?
GZipStream cmp = new GZipStream(File.Open(cacheFileName, FileMode.Create), CompressionMode.Compress);
BinaryWriter fs = new BinaryWriter(cmp);
fs.Write((Int32)(positionCnt * stationCnt));
for (int p = 0; p < positionCnt; p++)
{
for (int s = 0; s < stationCnt; s++)
{
fs.Write(BoundData[p, s]);
}
}
fs.Close();
I am trying to save big UInt16 array into a file. positionCnt is about 50000, stationCnt is about 2500. Saved directly, without GZipStream, the file is about 250MB which can be compressed by external zip program to 19MB. With the following code the file is 507MB. What do I do wrong?
GZipStream cmp = new GZipStream(File.Open(cacheFileName, FileMode.Create), CompressionMode.Compress);
BinaryWriter fs = new BinaryWriter(cmp);
fs.Write((Int32)(positionCnt * stationCnt));
for (int p = 0; p < positionCnt; p++)
{
for (int s = 0; s < stationCnt; s++)
{
fs.Write(BoundData[p, s]);
}
}
fs.Close();
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
不确定您运行的 .NET 版本是什么。在早期版本中,它使用的窗口大小与您写入的缓冲区大小相同。因此,在您的情况下,它将尝试单独压缩每个整数。我认为他们在 .NET 4.0 中改变了这一点,但尚未验证。
无论如何,您想要做的是在
GZipStream
之前创建一个缓冲流:// 创建具有 64 KB 缓冲区的文件流FileStream fs = new FileStream(文件名, FileMode.Create, FileAccess.Write, FileShare.None, 65536);
GZipStream cmp = new GZipStream(fs, CompressionMode.Compress);
...
这样,
GZipStream
就可以获取 64 KB 块中的数据,并且可以更好地进行压缩。大于 64KB 的缓冲区不会提供更好的压缩效果。
Not sure what version of .NET you're running on. In earlier versions, it used a window size that was the same size as the buffer that you wrote from. So in your case it would try to compress each integer individually. I think they changed that in .NET 4.0, but haven't verified that.
In any case, what you want to do is create a buffered stream ahead of the
GZipStream
:// Create file stream with 64 KB bufferFileStream fs = new FileStream(filename, FileMode.Create, FileAccess.Write, FileShare.None, 65536);
GZipStream cmp = new GZipStream(fs, CompressionMode.Compress);
...
This way, the
GZipStream
gets data in 64 Kbyte chunks, and can do a much better job of compressing.Buffers larger than 64KB won't give you any better compression.
不管出于什么原因,性能对一次写入的数据量很敏感,在快速阅读 .Net 中的 GZip 实现时,我并没有意识到这一点。我针对几种写入
GZipStream
的方式对您的代码进行了基准测试,发现最有效的版本将长步幅写入磁盘。在这种情况下,需要权衡内存,因为您需要根据您想要的步幅长度将
short[,]
转换为byte[]
:For whatever reason, which is not apparent to me during a quick read of the GZip implementation in .Net, the performance is sensitive to the amount of data written at once. I benchmarked your code against a few styles of writing to the
GZipStream
and found the most efficient version wrote long strides to the disk.The trade-off is memory in this case, as you need to convert the
short[,]
tobyte[]
based on the stride length you'd like: