并行扩展
我有一个需要大量 IO 操作的应用程序,例如文件复制、压缩和在文件系统中移动文件、复制到备份服务器。
我将此程序构建为单线程。运行时间为 2 分钟。
我使用并行扩展和任务构建了该程序的另一个版本,该版本也几乎在 2 分钟内运行。
换句话说,由于大量 IO,我没有看到使用 Parallels 带来的性能提升。
如果我将应用程序部署到刀片服务器,是否会得到相同的结果?
刀片服务器在多通道上处理 IO 的速度是否比我的工作站更快?
将 Parallels 与 IO 绑定应用程序一起使用有什么好处吗?
I have an application with heavy IO operations such as file copying, zipping and moving the files around the file system, copying to backup servers.
I build this program as single threaded. It runs in 2 minutes.
I built another version of this program with Parallel extensions and using Task, which runs almost in 2 minutes as well.
In other words I didnt see a performance gain by using Parallels due to heavy IO.
Would I get the same results if i deploy the application to a blade server?
Does blade servers process IO faster/ on multi channels than my workstation?
There is no benefit of using Parallels with IO bound applications?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果您所做的只是在系统中复制或移动文件,那么 TPL 提供的并行性不会给您带来多大好处。例如,移动实际上并不使用任何 CPU,它只是更改磁盘目录记录结构中的文件位置。
文件压缩是另一回事。在这里,您加载数据并使用 CPU 对其进行压缩,然后再将其保存到磁盘。您也许可以使用管道或并行循环以更有效的方式加载/压缩/保存数据。您可以让多个线程处理不同的文件,而不是让一个线程压缩每个文件。
以下代码按顺序压缩文件负载,然后并行压缩。我在 i7 920 和 Intel X25 SSD 上压缩 329 个 JPG 图像(总计 800Mb 数据)时得到以下时间。
顺序:39901ms
并行:12404ms
有关压缩代码,请参阅如何:压缩文件
If all you're doing is copying or moving files across the system then the parallelism provided by the TPL isn't going to do you much good. Moving for example really doesn't use any CPU it simply changes the files location in the disk's directory record structure.
File compression is a different story. Here you're loading data and using the CPU to compress it before saving it out to disk. You might be able to use a pipeline or parallel loop to load/compress/save the data in a more efficient way. Instead of having one thread work on compressing each file you could have multiple threads working on different files.
The following code compresses a load of files sequentially and then in parallel. I get the following times on an i7 920 and with a intel X25 SSD compressing 329 JPG images totalling 800Mb of data.
Sequential: 39901ms
Parallel: 12404ms
For the compression code see How to: Compress Files
如果您在一台物理设备上移动文件,则向同一台设备发出多个并行 IO 请求不会带来太多性能优势。该设备的运行速度已经比 CPU 慢了许多数量级,因此并行发出的多个请求仍将在设备上排队等待一一处理。您的并行代码正在被序列化,因为它都访问同一设备,而该设备一次无法真正处理多个请求。
如果您的磁盘控制器实现“电梯查找”、“分散-聚集”或其他无序操作,您可能会看到并行代码的性能略有改善,但性能差异相对较小。
当您在许多不同的物理设备之间移动文件时,您应该发现文件 I/O 的更有价值的性能差异。您应该能够将磁盘 A 上的文件移动或复制到磁盘 A 上的某个其他位置,同时还将磁盘 B 上的文件复制到磁盘 C 上。对于许多物理设备,您不必让所有并行请求堆积起来等待一台设备即可满足所有要求。
您可能会在网络 I/O 中看到类似的结果:如果所有内容都通过一个以太网卡/网段进行,那么您将无法实现与使用多个以太网卡和多个网段时一样多的并行性。
If you're moving files around on one physical device, you're not going to see much performance benefit from making multiple parallel IO requests to the same one device. The device is already operating many orders of magnitude slower than the CPU, so multiple requests made in parallel will still line up to be handled one by one on the device. Your parallel code is being serialized because it's all accessing the same device that can't really handle more than one request at a time.
You might see a tiny perf improvement with parallel code if your disk controller implements "elevator seeks", "scatter-gather", or other out-of-order operations, but the perf difference will be relatively small.
Where you should find a more rewarding perf difference for file I/O is when you're moving files between many different physical devices. You should be able to move or copy a file on disk A to some other location on disk A while also copying a file on disk B to disk C. With many physical devices, you don't have all the parallel requests stacking up waiting for the one device to fill all the requests.
You'll probably see similar results with network I/O: If everything is going through one ethernet card / network segment you're not going to realize as much parallelism as when you have multiple ethernet cards and multiple network segments to work with.
我认为并行扩展的优势对于 CPU 操作可能非常重要。 Donnu 不知道它会如何影响 IO。
I think the advantage of Parallel extensions could be significant on CPU operations. Donnu how it's supposed to affect IO tho.
这完全取决于您是 CPU 密集型还是 IO 密集型。我建议进行一些性能测试,看看瓶颈在哪里。
如果您发现正在移动和压缩大量文件(到不同的磁盘,因为在同一磁盘上移动只是 FAT 表更改),您可能需要考虑实现一个在移动时进行压缩的流文件移动器。这样可以节省移动文件后重新读取文件的额外 IO。我已经通过移动和校验和完成了这一点,就我而言,这是一个巨大的性能提升。
希望这有帮助。
It all depends on whether you are CPU bound or IO bound. I would suggest doing some performance testing to see where you bottle necks are.
If you find you are moving and compressing a lot of files (to different disks, as a move on the same disk is just a FAT table change) you might want to look at implementing a streaming file mover that compresses as it moves. This can save the extra IO of re-reading the files after moving them. I have done this with moving and checksumming and in my case was a huge performance bump.
Hope this helps.
我有一个在 WinForms 中实现的应用程序,它在大约 5 分钟内处理约 7,800 个 URL(下载 URL、解析内容、查找特定的数据片段,如果找到它要查找的内容,则会对该数据进行一些额外的处理。
这特定应用程序过去需要 26 到 30 分钟才能运行,但通过将代码更改为 TPL(.NET v4.0 中的任务并行库),只需 5 分钟即可执行。计算机是配备双四核 Xeon 的 Dell T7500 工作站处理器(3 GHz),运行 24 GB RAM,以及 Windows 7 Ultimate 64 位版本
不过,这与您的情况并不完全相同,这也是 IO 密集型的。 TPL 文档表明它最初是为处理器设计的。绑定问题集,但这并不排除在 IO 情况下使用它(正如我的应用程序向我演示的那样)如果您至少有 4 个核心并且您没有看到处理时间显着下降,那么您可能有其他实现。阻碍 TPL 真正高效的问题(锁、硬盘驱动器项目等)。 《使用 Microsoft .NET 进行并行编程》一书确实帮助我理解了需要“如何”修改代码才能真正利用所有这些功能。
我认为值得一看。
I have an application that is implemented in WinForms that processes ~7,800 URLs in approximately 5 minutes (downloads the URL, parses the content, looks for specific pieces of data and if it finds what its looking for does some additional processing of that data.
This specific application used to take between 26 to 30 minutes to run, but by changing the code to the TPL (Task Parallel Library in .NET v4.0) it executes in just 5. The computer is a Dell T7500 workstation with dual quad core Xeon processors (3 GHz), running with 24 GB of RAM, and Windows 7 Ultimate 64-bit edition.
Though, it's not exactly the same as your situation this too is extremely IO intensive. The documentation on TPL states it was originally conceived for processor bound problem sets, but this doesn't rule out using it in IO situations (as my application demonstrates to me). If you have at least 4 cores and you're not seeing your processing time drop significantly then it's possible you have other implementation issues that are preventing the TPL from really being efficient (locks, hard drive items, etc.). The book Parallel Programming with Microsoft .NET really helped me to understand "how" your code needs to be modified to really take advantage of all that power.
Worth a look in my opinion.