在 C# 中压缩大文件目录(每个黑白 100-300 MB)的最佳方法(即压缩率和速度的最佳组合)?

发布于 2024-12-15 03:38:25 字数 1417 浏览 2 评论 0原文

我正在编写一个控制台应用程序来压缩一个包含大文件(大约 30 个)的目录,每个文件大小约为 100-300 MB,每天执行一次(当新文件进入时)。我尝试过使用内置的 GZipStream 类,每个文件大约需要 15 秒,压缩率约为 0.212。我想知道第三方库是否有更有效的方法,或者是否有某种方法可以提高压缩率。最后,线程是否是加速此过程的一个选项?

这是我当前使用的代码(基本上来自 MSDN 关于 GZipStream 的文章)

private void CompressFile(FileInfo fileInfo)
{
    // Get the stream of the source file.
    using (FileStream inFile = fileInfo.OpenRead())
    {
        Timer.Reset();

        // Prevent compressing hidden and 
        // already compressed files.
        if ((File.GetAttributes(fileInfo.FullName) & FileAttributes.Hidden) != FileAttributes.Hidden & fileInfo.Extension != ".gz")
        {
            // Create the compressed file.
            using (FileStream outFile = File.Create(fileInfo.FullName + ".gz"))
            {
                using (GZipStream Compress = new GZipStream(outFile, CompressionMode.Compress))
                {
                    // Copy the source file into 
                    // the compression stream.
                    Timer.Start();
                    inFile.CopyTo(Compress);
                    Timer.Stop();

                    Console.WriteLine("Compressed {0} from {1} to {2} bytes in {3} seconds.",
                        fileInfo.Name, fileInfo.Length.ToString(), outFile.Length.ToString(), ((double)Timer.ElapsedMilliseconds / 1000));
                }
            }
        }
    }
}

谢谢!

I’m writing a console app to compress a directory of large files (around 30) with each file coming in at around 100-300 MB, which will be done once per day (as new files come in). I’ve tried using the built in GZipStream class and it took about 15 seconds per file with a compression ratio of about 0.212. I was wondering if there is a more efficient way out there with 3rd party libraries or if there's some way to increase the compression ratio. Finally, is threading an option to speed this process up?

Here's the code Im currently using (basically its from the MSDN article on GZipStream)

private void CompressFile(FileInfo fileInfo)
{
    // Get the stream of the source file.
    using (FileStream inFile = fileInfo.OpenRead())
    {
        Timer.Reset();

        // Prevent compressing hidden and 
        // already compressed files.
        if ((File.GetAttributes(fileInfo.FullName) & FileAttributes.Hidden) != FileAttributes.Hidden & fileInfo.Extension != ".gz")
        {
            // Create the compressed file.
            using (FileStream outFile = File.Create(fileInfo.FullName + ".gz"))
            {
                using (GZipStream Compress = new GZipStream(outFile, CompressionMode.Compress))
                {
                    // Copy the source file into 
                    // the compression stream.
                    Timer.Start();
                    inFile.CopyTo(Compress);
                    Timer.Stop();

                    Console.WriteLine("Compressed {0} from {1} to {2} bytes in {3} seconds.",
                        fileInfo.Name, fileInfo.Length.ToString(), outFile.Length.ToString(), ((double)Timer.ElapsedMilliseconds / 1000));
                }
            }
        }
    }
}

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

做个ˇ局外人 2024-12-22 03:38:25

这个答案: 是在多个线程上并行调用 ICsharpCode.SharpZipLib 是安全的

给出了 GZIP 压缩替代方案的一些比较。

您的数据足够大,您可以从并行压缩中受益。

此示例代码执行并行压缩。

与内置的 GZipStream 相比,并行方法大约需要一半的时间,并且呈现“更好一点”的压缩。

DotNetZip 还具有用于 BZip2 压缩的类(包括并行实现)。 BZip2 比 GZIP 慢得多,但提供更好的压缩比。

This answer: Is it safe to call ICsharpCode.SharpZipLib in parallel on multiple threads

gives some comparisons of GZIP compression alternatives.

Your data is large enough that you could benefit from doing compression in parallel.

This sample code does the parallel compression.

As compared to the builtin GZipStream, the parallel approach takes about half the time and renders "a little better" compression.

DotNetZip also has classes for BZip2 compression (including a parallel implementation). BZip2 is much slower than GZIP, but gives you a better compression ratio.

梦里泪两行 2024-12-22 03:38:25

没有通用的方法。您需要针对

  • 有效负载
  • 文件系统
  • CPU 负载和容量

对其进行分析,您可以传递 GZipStream 构造函数的Level 参数

我会考虑使用预先存在的(外部)工具来执行 工作。通过比较基准,您会更快,因为您不必亲自去实施它们。我真的建议使用类似 unix 的工具,但你可能很难找到适合你的 Windows 平台的

There is no generic way. You need to profile it for the

  • payload
  • file system
  • CPU load and capacity

You could pass the Level parameter to the GZipStream Constructor

I'd consider using pre-existing (external) tools to do the job. You'll be much quicker with comparison benchmarks, because you don't have to go and implement them. I'd really suggest the unix like tools but you might have trouble finding them for your Windows platform

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文