在 C# 中压缩大文件目录(每个黑白 100-300 MB)的最佳方法(即压缩率和速度的最佳组合)?
我正在编写一个控制台应用程序来压缩一个包含大文件(大约 30 个)的目录,每个文件大小约为 100-300 MB,每天执行一次(当新文件进入时)。我尝试过使用内置的 GZipStream 类,每个文件大约需要 15 秒,压缩率约为 0.212。我想知道第三方库是否有更有效的方法,或者是否有某种方法可以提高压缩率。最后,线程是否是加速此过程的一个选项?
这是我当前使用的代码(基本上来自 MSDN 关于 GZipStream 的文章)
private void CompressFile(FileInfo fileInfo)
{
// Get the stream of the source file.
using (FileStream inFile = fileInfo.OpenRead())
{
Timer.Reset();
// Prevent compressing hidden and
// already compressed files.
if ((File.GetAttributes(fileInfo.FullName) & FileAttributes.Hidden) != FileAttributes.Hidden & fileInfo.Extension != ".gz")
{
// Create the compressed file.
using (FileStream outFile = File.Create(fileInfo.FullName + ".gz"))
{
using (GZipStream Compress = new GZipStream(outFile, CompressionMode.Compress))
{
// Copy the source file into
// the compression stream.
Timer.Start();
inFile.CopyTo(Compress);
Timer.Stop();
Console.WriteLine("Compressed {0} from {1} to {2} bytes in {3} seconds.",
fileInfo.Name, fileInfo.Length.ToString(), outFile.Length.ToString(), ((double)Timer.ElapsedMilliseconds / 1000));
}
}
}
}
}
谢谢!
I’m writing a console app to compress a directory of large files (around 30) with each file coming in at around 100-300 MB, which will be done once per day (as new files come in). I’ve tried using the built in GZipStream class and it took about 15 seconds per file with a compression ratio of about 0.212. I was wondering if there is a more efficient way out there with 3rd party libraries or if there's some way to increase the compression ratio. Finally, is threading an option to speed this process up?
Here's the code Im currently using (basically its from the MSDN article on GZipStream)
private void CompressFile(FileInfo fileInfo)
{
// Get the stream of the source file.
using (FileStream inFile = fileInfo.OpenRead())
{
Timer.Reset();
// Prevent compressing hidden and
// already compressed files.
if ((File.GetAttributes(fileInfo.FullName) & FileAttributes.Hidden) != FileAttributes.Hidden & fileInfo.Extension != ".gz")
{
// Create the compressed file.
using (FileStream outFile = File.Create(fileInfo.FullName + ".gz"))
{
using (GZipStream Compress = new GZipStream(outFile, CompressionMode.Compress))
{
// Copy the source file into
// the compression stream.
Timer.Start();
inFile.CopyTo(Compress);
Timer.Stop();
Console.WriteLine("Compressed {0} from {1} to {2} bytes in {3} seconds.",
fileInfo.Name, fileInfo.Length.ToString(), outFile.Length.ToString(), ((double)Timer.ElapsedMilliseconds / 1000));
}
}
}
}
}
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这个答案: 是在多个线程上并行调用 ICsharpCode.SharpZipLib 是安全的
给出了 GZIP 压缩替代方案的一些比较。
您的数据足够大,您可以从并行压缩中受益。
此示例代码执行并行压缩。
与内置的 GZipStream 相比,并行方法大约需要一半的时间,并且呈现“更好一点”的压缩。
DotNetZip 还具有用于 BZip2 压缩的类(包括并行实现)。 BZip2 比 GZIP 慢得多,但提供更好的压缩比。
This answer: Is it safe to call ICsharpCode.SharpZipLib in parallel on multiple threads
gives some comparisons of GZIP compression alternatives.
Your data is large enough that you could benefit from doing compression in parallel.
This sample code does the parallel compression.
As compared to the builtin GZipStream, the parallel approach takes about half the time and renders "a little better" compression.
DotNetZip also has classes for BZip2 compression (including a parallel implementation). BZip2 is much slower than GZIP, but gives you a better compression ratio.
没有通用的方法。您需要针对
对其进行分析,您可以传递 GZipStream 构造函数的
Level
参数我会考虑使用预先存在的(外部)工具来执行 工作。通过比较基准,您会更快,因为您不必亲自去实施它们。我真的建议使用类似 unix 的工具,但你可能很难找到适合你的 Windows 平台的
There is no generic way. You need to profile it for the
You could pass the
Level
parameter to the GZipStream ConstructorI'd consider using pre-existing (external) tools to do the job. You'll be much quicker with comparison benchmarks, because you don't have to go and implement them. I'd really suggest the unix like tools but you might have trouble finding them for your Windows platform