GZipStream 有效性

发布于 2024-12-06 18:42:59 字数 512 浏览 0 评论 0原文

我正在尝试将大 UInt16 数组保存到文件中。 positionCnt 约为 50000,stationCnt 约为 2500。直接保存,不使用 GZipStream,文件约为 250MB,可以通过外部 zip 程序压缩到 19MB。使用以下代码,文件大小为 507MB。我做错了什么?

GZipStream cmp = new GZipStream(File.Open(cacheFileName, FileMode.Create), CompressionMode.Compress);
BinaryWriter fs = new BinaryWriter(cmp);
fs.Write((Int32)(positionCnt * stationCnt));
for (int p = 0; p < positionCnt; p++)
{
    for (int s = 0; s < stationCnt; s++)
    {
       fs.Write(BoundData[p, s]);
    }
}
fs.Close();

I am trying to save big UInt16 array into a file. positionCnt is about 50000, stationCnt is about 2500. Saved directly, without GZipStream, the file is about 250MB which can be compressed by external zip program to 19MB. With the following code the file is 507MB. What do I do wrong?

GZipStream cmp = new GZipStream(File.Open(cacheFileName, FileMode.Create), CompressionMode.Compress);
BinaryWriter fs = new BinaryWriter(cmp);
fs.Write((Int32)(positionCnt * stationCnt));
for (int p = 0; p < positionCnt; p++)
{
    for (int s = 0; s < stationCnt; s++)
    {
       fs.Write(BoundData[p, s]);
    }
}
fs.Close();

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

∝单色的世界 2024-12-13 18:42:59

不确定您运行的 .NET 版本是什么。在早期版本中,它使用的窗口大小与您写入的缓冲区大小相同。因此,在您的情况下,它将尝试单独压缩每个整数。我认为他们在 .NET 4.0 中改变了这一点,但尚未验证。

无论如何,您想要做的是在 GZipStream 之前创建一个缓冲流:

// 创建具有 64 KB 缓冲区的文件流
FileStream fs = new FileStream(文件名, FileMode.Create, FileAccess.Write, FileShare.None, 65536);
GZipStream cmp = new GZipStream(fs, CompressionMode.Compress);
...

GZipStream cmp = new GZipStream(File.Open(cacheFileName, FileMode.Create), CompressionMode.Compress);
BufferedStream buffStrm = new BufferedStream(cmp, 65536);
BinaryWriter fs = new BinaryWriter(buffStrm);

这样,GZipStream 就可以获取 64 KB 块中的数据,并且可以更好地进行压缩。

大于 64KB 的缓冲区不会提供更好的压缩效果。

Not sure what version of .NET you're running on. In earlier versions, it used a window size that was the same size as the buffer that you wrote from. So in your case it would try to compress each integer individually. I think they changed that in .NET 4.0, but haven't verified that.

In any case, what you want to do is create a buffered stream ahead of the GZipStream:

// Create file stream with 64 KB buffer
FileStream fs = new FileStream(filename, FileMode.Create, FileAccess.Write, FileShare.None, 65536);
GZipStream cmp = new GZipStream(fs, CompressionMode.Compress);
...

GZipStream cmp = new GZipStream(File.Open(cacheFileName, FileMode.Create), CompressionMode.Compress);
BufferedStream buffStrm = new BufferedStream(cmp, 65536);
BinaryWriter fs = new BinaryWriter(buffStrm);

This way, the GZipStream gets data in 64 Kbyte chunks, and can do a much better job of compressing.

Buffers larger than 64KB won't give you any better compression.

美人如玉 2024-12-13 18:42:59

不管出于什么原因,性能对一次写入的数据量很敏感,在快速阅读 .Net 中的 GZip 实现时,我并没有意识到这一点。我针对几种写入 GZipStream 的方式对您的代码进行了基准测试,发现最有效的版本将长步幅写入磁盘。

在这种情况下,需要权衡内存,因为您需要根据您想要的步幅长度将 short[,] 转换为 byte[]

using (var writer = new GZipStream(File.Create("compressed.gz"),
                                   CompressionMode.Compress))
{
    var bytes = new byte[data.GetLength(1) * 2];
    for (int ii = 0; ii < data.GetLength(0); ++ii)
    {
        Buffer.BlockCopy(data, bytes.Length * ii, bytes, 0, bytes.Length);
        writer.Write(bytes, 0, bytes.Length);
    }

    // Random data written to every other 4 shorts
    // 250,000,000 uncompressed.dat
    // 165,516,035 compressed.gz (1 row strides)
    // 411,033,852 compressed2.gz (your version)
}

For whatever reason, which is not apparent to me during a quick read of the GZip implementation in .Net, the performance is sensitive to the amount of data written at once. I benchmarked your code against a few styles of writing to the GZipStream and found the most efficient version wrote long strides to the disk.

The trade-off is memory in this case, as you need to convert the short[,] to byte[] based on the stride length you'd like:

using (var writer = new GZipStream(File.Create("compressed.gz"),
                                   CompressionMode.Compress))
{
    var bytes = new byte[data.GetLength(1) * 2];
    for (int ii = 0; ii < data.GetLength(0); ++ii)
    {
        Buffer.BlockCopy(data, bytes.Length * ii, bytes, 0, bytes.Length);
        writer.Write(bytes, 0, bytes.Length);
    }

    // Random data written to every other 4 shorts
    // 250,000,000 uncompressed.dat
    // 165,516,035 compressed.gz (1 row strides)
    // 411,033,852 compressed2.gz (your version)
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文