当前位置：文江博客话题详情

C# .NET IO parallel-processing parallel-extensions

并行扩展

发布于 2024-10-22 08:41:35 字数 304 浏览 2 评论 0原文

我有一个需要大量 IO 操作的应用程序，例如文件复制、压缩和在文件系统中移动文件、复制到备份服务器。

我将此程序构建为单线程。运行时间为 2 分钟。

我使用并行扩展和任务构建了该程序的另一个版本，该版本也几乎在 2 分钟内运行。

换句话说，由于大量 IO，我没有看到使用 Parallels 带来的性能提升。

如果我将应用程序部署到刀片服务器，是否会得到相同的结果？

刀片服务器在多通道上处理 IO 的速度是否比我的工作站更快？

将 Parallels 与 IO 绑定应用程序一起使用有什么好处吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

只有一腔孤勇 2024-10-29 08:41:35

如果您所做的只是在系统中复制或移动文件，那么 TPL 提供的并行性不会给您带来多大好处。例如，移动实际上并不使用任何 CPU，它只是更改磁盘目录记录结构中的文件位置。

文件压缩是另一回事。在这里，您加载数据并使用 CPU 对其进行压缩，然后再将其保存到磁盘。您也许可以使用管道或并行循环以更有效的方式加载/压缩/保存数据。您可以让多个线程处理不同的文件，而不是让一个线程压缩每个文件。

以下代码按顺序压缩文件负载，然后并行压缩。我在 i7 920 和 Intel X25 SSD 上压缩 329 个 JPG 图像（总计 800Mb 数据）时得到以下时间。

顺序：39901ms

并行：12404ms

class Program
{
    static void Main(string[] args)
    {
        string[] paths = Directory.GetFiles(@"C:\temp", "*.jpg");

        DirectoryInfo di = new DirectoryInfo(@"C:\temp");

        Stopwatch sw = new Stopwatch();
        sw.Start();
        foreach (FileInfo fi in di.GetFiles("*.jpg"))
        {
            Compress(fi);
        }
        sw.Stop();
        Console.WriteLine("Sequential: " + sw.ElapsedMilliseconds);

        Console.WriteLine("Delete the results files and then rerun...");
        Console.ReadKey();

        sw.Reset();
        sw.Start();
        Parallel.ForEach(di.GetFiles("*.jpg"), (fi) => { Compress(fi); });
        sw.Stop();

        Console.WriteLine("Parallel: " + sw.ElapsedMilliseconds);
        Console.ReadKey();
    }

    public static void Compress(FileInfo fi)
    {
        using (FileStream inFile = fi.OpenRead())
        {
            if ((File.GetAttributes(fi.FullName)
                & FileAttributes.Hidden)
                != FileAttributes.Hidden & fi.Extension != ".gz")
            {
                using (FileStream outFile =
                            File.Create(fi.FullName + ".gz"))
                {
                    using (GZipStream Compress =
                        new GZipStream(outFile,
                        CompressionMode.Compress))
                    {
                        inFile.CopyTo(Compress);
                    }
                }
            }
        }
    }
}

有关压缩代码，请参阅如何：压缩文件

If all you're doing is copying or moving files across the system then the parallelism provided by the TPL isn't going to do you much good. Moving for example really doesn't use any CPU it simply changes the files location in the disk's directory record structure.

File compression is a different story. Here you're loading data and using the CPU to compress it before saving it out to disk. You might be able to use a pipeline or parallel loop to load/compress/save the data in a more efficient way. Instead of having one thread work on compressing each file you could have multiple threads working on different files.

The following code compresses a load of files sequentially and then in parallel. I get the following times on an i7 920 and with a intel X25 SSD compressing 329 JPG images totalling 800Mb of data.

Sequential: 39901ms

Parallel: 12404ms

class Program
{
    static void Main(string[] args)
    {
        string[] paths = Directory.GetFiles(@"C:\temp", "*.jpg");

        DirectoryInfo di = new DirectoryInfo(@"C:\temp");

        Stopwatch sw = new Stopwatch();
        sw.Start();
        foreach (FileInfo fi in di.GetFiles("*.jpg"))
        {
            Compress(fi);
        }
        sw.Stop();
        Console.WriteLine("Sequential: " + sw.ElapsedMilliseconds);

        Console.WriteLine("Delete the results files and then rerun...");
        Console.ReadKey();

        sw.Reset();
        sw.Start();
        Parallel.ForEach(di.GetFiles("*.jpg"), (fi) => { Compress(fi); });
        sw.Stop();

        Console.WriteLine("Parallel: " + sw.ElapsedMilliseconds);
        Console.ReadKey();
    }

    public static void Compress(FileInfo fi)
    {
        using (FileStream inFile = fi.OpenRead())
        {
            if ((File.GetAttributes(fi.FullName)
                & FileAttributes.Hidden)
                != FileAttributes.Hidden & fi.Extension != ".gz")
            {
                using (FileStream outFile =
                            File.Create(fi.FullName + ".gz"))
                {
                    using (GZipStream Compress =
                        new GZipStream(outFile,
                        CompressionMode.Compress))
                    {
                        inFile.CopyTo(Compress);
                    }
                }
            }
        }
    }
}

For the compression code see How to: Compress Files

回复收藏 0 原文

ま柒月 2024-10-29 08:41:35

如果您在一台物理设备上移动文件，则向同一台设备发出多个并行 IO 请求不会带来太多性能优势。该设备的运行速度已经比 CPU 慢了许多数量级，因此并行发出的多个请求仍将在设备上排队等待一一处理。您的并行代码正在被序列化，因为它都访问同一设备，而该设备一次无法真正处理多个请求。

如果您的磁盘控制器实现“电梯查找”、“分散-聚集”或其他无序操作，您可能会看到并行代码的性能略有改善，但性能差异相对较小。

当您在许多不同的物理设备之间移动文件时，您应该发现文件 I/O 的更有价值的性能差异。您应该能够将磁盘 A 上的文件移动或复制到磁盘 A 上的某个其他位置，同时还将磁盘 B 上的文件复制到磁盘 C 上。对于许多物理设备，您不必让所有并行请求堆积起来等待一台设备即可满足所有要求。

您可能会在网络 I/O 中看到类似的结果：如果所有内容都通过一个以太网卡/网段进行，那么您将无法实现与使用多个以太网卡和多个网段时一样多的并行性。

回复收藏 0 原文

甜｀诱少女 2024-10-29 08:41:35

我认为并行扩展的优势对于 CPU 操作可能非常重要。 Donnu 不知道它会如何影响 IO。

回复收藏 0 原文

浊酒尽余欢 2024-10-29 08:41:35

这完全取决于您是 CPU 密集型还是 IO 密集型。我建议进行一些性能测试，看看瓶颈在哪里。

如果您发现正在移动和压缩大量文件（到不同的磁盘，因为在同一磁盘上移动只是 FAT 表更改），您可能需要考虑实现一个在移动时进行压缩的流文件移动器。这样可以节省移动文件后重新读取文件的额外 IO。我已经通过移动和校验和完成了这一点，就我而言，这是一个巨大的性能提升。

希望这有帮助。

回复收藏 0 原文

夜夜流光相皎洁 2024-10-29 08:41:35

我有一个在 WinForms 中实现的应用程序，它在大约 5 分钟内处理约 7,800 个 URL（下载 URL、解析内容、查找特定的数据片段，如果找到它要查找的内容，则会对该数据进行一些额外的处理。

这特定应用程序过去需要 26 到 30 分钟才能运行，但通过将代码更改为 TPL（.NET v4.0 中的任务并行库），只需 5 分钟即可执行。计算机是配备双四核 Xeon 的 Dell T7500 工作站处理器（3 GHz），运行 24 GB RAM，以及 Windows 7 Ultimate 64 位版本

不过，这与您的情况并不完全相同，这也是 IO 密集型的。 TPL 文档表明它最初是为处理器设计的。绑定问题集，但这并不排除在 IO 情况下使用它（正如我的应用程序向我演示的那样）如果您至少有 4 个核心并且您没有看到处理时间显着下降，那么您可能有其他实现。阻碍 TPL 真正高效的问题（锁、硬盘驱动器项目等）。《使用 Microsoft .NET 进行并行编程》一书确实帮助我理解了需要“如何”修改代码才能真正利用所有这些功能。

我认为值得一看。

回复收藏 0 原文

~没有更多了~