智能文件复制算法(与操作系统无关)
我正在尝试创建一个快速且有点智能的文件复制算法(用 C# 编写,但与平台无关)。
我的目标:
- 我不想使用任何特定于平台的代码(没有pinvokes或任何东西)
- 我想利用多个核心,但这看起来很愚蠢,因为同时读/写看起来会更慢,对吗? (如果我错了,请纠正我)
- 我想跟踪复制进度,因此 File.Copy 不是一个选项
我想出的代码没什么特别的,我正在寻找加快速度的方法:
public bool Copy(string sourcePath, string destinationPath, ref long copiedSize, long totalSize, int fileNum, int fileCount, CopyProgressCallback progressCallback)
{
FileStream source = File.OpenRead(sourcePath);
FileStream dest = File.Open(destinationPath, FileMode.Create);
int size = (int)(1024 * 256); // 256KB
int read = 0;
byte[] buffer = new byte[size];
try
{
while ((read = source.Read(buffer, 0, size)) != 0)
{
dest.Write(buffer, 0, read);
copiedSize += read;
progressCallback(copiedSize, totalSize, fileNum, fileCount, j);
}
return true;
}
catch
{
// No I don't care about exception reporting.
return false;
}
finally
{
source.Close();
dest.Close();
}
}
我尝试过但没有成功的事情:
- 随着我的进展增加缓冲区(速度损失和 CD/DVD 的缓存问题)
- 尝试了“CopyFileEx”-pinvokes 减慢了复制速度
- 尝试了许多不同的缓冲区大小,似乎 256KB最好的解决方案
- 尝试边写边读 - 放慢速度
- 更改“progressCallback”以在 1 秒后更新 UI(使用 Stopwatch 类) - 这显着提高了速度
任何欢迎建议 - 我会更新当我尝试新东西时的代码/东西。建议不要 不必是代码——只是想法。
I'm trying to create, a fast and somewhat intelligent file copy algorithm (in c# but platform independent).
My goals:
- I don't want to use any platform specific code (no pinvokes or anything)
- I'd like to take advantage of multiple cores but this seems stupid since doing simultaneous reads/writes would seem slower right? (correct me please if I'm wrong)
- I want to keep track of the copying progress so File.Copy is not an option
The code that I've come up with is nothing special and I'm looking into ways of speeding it up:
public bool Copy(string sourcePath, string destinationPath, ref long copiedSize, long totalSize, int fileNum, int fileCount, CopyProgressCallback progressCallback)
{
FileStream source = File.OpenRead(sourcePath);
FileStream dest = File.Open(destinationPath, FileMode.Create);
int size = (int)(1024 * 256); // 256KB
int read = 0;
byte[] buffer = new byte[size];
try
{
while ((read = source.Read(buffer, 0, size)) != 0)
{
dest.Write(buffer, 0, read);
copiedSize += read;
progressCallback(copiedSize, totalSize, fileNum, fileCount, j);
}
return true;
}
catch
{
// No I don't care about exception reporting.
return false;
}
finally
{
source.Close();
dest.Close();
}
}
Things that I've tried and didn't work out:
- Increasing buffer as I go along (loss of speed and caching problems with CD/DVD)
- Tried 'CopyFileEx' - the pinvokes was slowing copying
- Tried many different buffer sizes and 256KB seems the best solution
- Tried to read as I write - slow down
- Changed 'progressCallback' to update UI after 1 second (using Stopwatch class) - this has significantly improved speed
ANY suggestions are welcome - I'll be updating the code/stuff as I try out new stuff. Suggestions don'
t have to be code - just ideas.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果没有多个读/写头(这可能意味着多个磁盘),多个核心就没有多大用处。由于您的问题与平台无关,我随时建议使用并行 I/O 系统,让所有这些核心完成自己的工作而不是闲置。
如果您将自己限制为具有单个读/写臂和每个表面一个磁头的单个磁盘,则需要最大限度地减少臂的移动。您可能想要从一个表面上的磁道读取数据并写入另一个表面上的同一磁道。或者您可能想要从一个表面读取一个扇区,然后将其复制到同一表面上同一磁道上的另一个扇区。
然而,所有这些都涉及非常低级的操作(对我来说它们看起来非常低级)。通用计算的整体趋势似乎是不断地为程序员提供易于使用的工具,但代价是无法轻松访问低级操作。您为自己设定的任务大约是这样的:
欺骗 C# 以我想要的方式访问磁盘,而不是以它想要的方式。
祝你好运:-)
马克
PS 您提到的 CD/DVD 表明,尽管您没有声明,您正在尝试从磁盘到 CD/DVD 进行快速复制。如果是这样,您可能会考虑首先进行磁盘到磁盘复制,让您的复印机恢复工作,然后将副本从副本复制到另一个核心上的 CD/DVD。
Multiple cores aren't much use without multiple read/write heads which probably means multiple disks. Since your question is platform agnostic I feel free to suggest using a parallel I/O system and get all those cores doing their share of the work instead of idling.
If you limit yourself to a single disk with a single read/write arm and one head per surface you need to minimise movements of the arm. You probably want to read from a track on one surface and write to the same track on another surface. Or you might want to read, a sector from one surface and copy to another sector on the same track on the same surface.
However, all of this involves very low-level (well to me they look very low level) operations. The whole trend in general-purpose computing seems to be continually to give the programmer easy tools to use, at the cost of removing easy access to low level operations. The task you have set yourself is approximately this:
Trick C# into accessing the disk in the way I want it to, rather in the way it wants to.
Good luck with that :-)
Mark
PS Your mention of CD/DVD suggests, though you don't state it, that you are trying to make a fast copy from disk to CD/DVD. If so you might think about doing a disk-disk copy first, letting your copier get back to work, and putting the copying from the copy to the CD/DVD on another core.
你的速度可能会受到很多因素的影响,文件系统的簇大小、文件碎片、磁盘接口类型(ide/sata/等)、其他进程的其他磁盘操作等等。
每台计算机和操作参数都会有差异,从而产生不同的结果,一次代码更改可能会提高此处的速度,但可能会降低那里的速度。
对于小于 100mb 的文件,可能有一组默认设置,否则运行一组快速测试来预配置设置。使用一组缓冲区大小运行读/写速度测试,检测源路径和目标路径是否位于不同的磁盘上(如果是,则使副本成为多线程;一个读取,另一个写入)。仅在重大进度更新时加注/回调(进度已更改约+3%)。
Your speeds can be affected by many things, file system's cluster size, file fragmentation, disk interface type (ide/sata/etc), other disk operations from other processes, what have you.
Each computer and operation parameters will have differences giving different results, one code change might increase speeds here, but could decrease speeds there.
Maybe have a default set of settings for files smaller than say 100mb, otherwise run a quick set of tests to pre-configure the settings. Run a read/write speed test with a set number of buffer sizes, detect if the source and destination paths are located on separate disks (if so, make the copy multi-threaded; one that reads, the other that writes). Only raise/callback with progress updates that are significant (progress that has changed by like +3%).