将此 foreach 循环转换为并行执行的正确 PLINQ 语法是什么?

发布于 2024-11-08 22:44:03 字数 1684 浏览 0 评论 0原文

更新于 2011-05-20 12:49AM:foreach 仍然比我的应用程序的并行解决方案快 25%。并且不要使用最大并行度的集合计数,而是使用更接近计算机上核心数量的值。

=

我有一个 IO 密集型任务,我想并行运行。我想对文件夹中的每个文件应用相同的操作。在内部,该操作会产生一个 Dispatcher.Invoke,它将计算出的文件信息添加到 UI 线程上的集合中。因此,从某种意义上说,工作结果是方法调用的副作用,而不是方法调用直接返回的值。

这是我想要并行运行的核心循环

foreach (ShellObject sf in sfcoll)
    ProcessShellObject(sf, curExeName);

该循环的上下文在这里:

        var curExeName = Path.GetFileName(Assembly.GetEntryAssembly().Location);
        using (ShellFileSystemFolder sfcoll = ShellFileSystemFolder.FromFolderPath(_rootPath))
        {
            //This works, but is not parallel.
            foreach (ShellObject sf in sfcoll)
                ProcessShellObject(sf, curExeName);

            //This doesn't work.
            //My attempt at PLINQ.  This code never calls method ProcessShellObject.

            var query = from sf in sfcoll.AsParallel().WithDegreeOfParallelism(sfcoll.Count())
                        let p = ProcessShellObject(sf, curExeName)
                        select p;
        }

    private String ProcessShellObject(ShellObject sf, string curExeName)
    {
        String unusedReturnValueName = sf.ParsingName
        try
        {
            DesktopItem di = new DesktopItem(sf);
            //Up date DesktopItem stuff
            di.PropertyChanged += new PropertyChangedEventHandler(DesktopItem_PropertyChanged);
            ControlWindowHelper.MainWindow.Dispatcher.Invoke(
                (Action)(() => _desktopItemCollection.Add(di)));
        }
        catch (Exception ex)
        {
        }
        return unusedReturnValueName ;
    }

感谢您的帮助!

+汤姆

Update 2011-05-20 12:49AM: The foreach is still 25% faster than the parallel solution for my application. And don't use the collection count for max parallelism, use somthing closer to the number of cores on your machine.

=

I have an IO bound task that I would like to run in parallel. I want to apply the same operation to every file in a folder. Internally, the operation results in a Dispatcher.Invoke that adds the computed file info to a collection on the UI thread. So, in a sense, the work result is a side effect of the method call, not a value returned directly from the method call.

This is the core loop that I want to run in parallel

foreach (ShellObject sf in sfcoll)
    ProcessShellObject(sf, curExeName);

The context for this loop is here:

        var curExeName = Path.GetFileName(Assembly.GetEntryAssembly().Location);
        using (ShellFileSystemFolder sfcoll = ShellFileSystemFolder.FromFolderPath(_rootPath))
        {
            //This works, but is not parallel.
            foreach (ShellObject sf in sfcoll)
                ProcessShellObject(sf, curExeName);

            //This doesn't work.
            //My attempt at PLINQ.  This code never calls method ProcessShellObject.

            var query = from sf in sfcoll.AsParallel().WithDegreeOfParallelism(sfcoll.Count())
                        let p = ProcessShellObject(sf, curExeName)
                        select p;
        }

    private String ProcessShellObject(ShellObject sf, string curExeName)
    {
        String unusedReturnValueName = sf.ParsingName
        try
        {
            DesktopItem di = new DesktopItem(sf);
            //Up date DesktopItem stuff
            di.PropertyChanged += new PropertyChangedEventHandler(DesktopItem_PropertyChanged);
            ControlWindowHelper.MainWindow.Dispatcher.Invoke(
                (Action)(() => _desktopItemCollection.Add(di)));
        }
        catch (Exception ex)
        {
        }
        return unusedReturnValueName ;
    }

Thanks for any help!

+tom

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

骄傲 2024-11-15 22:44:03

编辑:关于您的问题的更新。我没有发现该任务是 IO 密集型的 - 并且可能所有文件都来自单个(传统?)磁盘。是的,这会变慢 - 因为您在不可并行的资源中引入了争用,迫使磁盘到处寻找。

有时,IO 密集型任务仍然可以有效地并行化 - 但这取决于资源本身是否可并行化。例如,SSD(寻道时间要短得多)可能完全改变您所看到的特征 - 或者如果您通过网络从多个速度缓慢的服务器获取数据,您可能会遇到 IO -绑定但不在单个通道上。


您已经创建了一个查询,但从未使用过它。强制所有内容与查询一起使用的最简单方法是使用 Count()ToList() 或类似的方法。但是,更好的方法是使用 Parallel.ForEach

var options = new ParallelOptions { MaxDegreeOfParallelism = sfcoll.Count() };
Parallel.ForEach(sfcoll, options, sf => ProcessShellObject(sf, curExeName));

我不确定设置最大并行度是否是正确的方法。它可能有效,但我不确定。解决此问题的另一种方法是将所有操作作为任务启动,并指定TaskCreationOptions.LongRunning。

EDIT: Regarding the update to your question. I hadn't spotted that the task was IO-bound - and presumably all the files are from a single (traditional?) disk. Yes, that would go slower - because you're introducing contention in a non-parallelizable resource, forcing the disk to seek all over the place.

IO-bound tasks can still be parallelized effectively sometimes - but it depends on whether the resource itself is parallelizable. For example, an SSD (which has much smaller seek times) may completely change the characteristics you're seeing - or if you're fetching over the network from several individually-slow servers, you could be IO-bound but not on a single channel.


You've created a query, but never used it. The simplest way of forcing everything to be used with the query would be to use Count() or ToList(), or something similar. However, a better approach would be to use Parallel.ForEach:

var options = new ParallelOptions { MaxDegreeOfParallelism = sfcoll.Count() };
Parallel.ForEach(sfcoll, options, sf => ProcessShellObject(sf, curExeName));

I'm not sure that setting the max degree of parallelism like that is the right approach though. It may work, but I'm not sure. A different way of approaching this would be to start all the operations as tasks, specifying TaskCreationOptions.LongRunning.

骑趴 2024-11-15 22:44:03

通过 LINQ 创建的查询对象是 IEnumerable。仅当您枚举它时才会对其进行评估(例如通过 foreach 循环):

        var query = from sf in sfcoll.AsParallel().WithDegreeOfParallelism(sfcoll.Count())
                    let p = ProcessShellObject(sf, curExeName)
                    select p;
        foreach(var q in query) 
        {
            // ....
        }
        // or:
        var results = query.ToArray(); // also enumerates query

Your query object created via LINQ is an IEnumerable. It gets evaluated only if you enumerate it (eg. via foreach loop):

        var query = from sf in sfcoll.AsParallel().WithDegreeOfParallelism(sfcoll.Count())
                    let p = ProcessShellObject(sf, curExeName)
                    select p;
        foreach(var q in query) 
        {
            // ....
        }
        // or:
        var results = query.ToArray(); // also enumerates query
桃扇骨 2024-11-15 22:44:03

是否应该在最后添加一行

var results = query.ToList();

Should you add a line in the end

var results = query.ToList();
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文