Parallel.ForEach 问题

发布于 2024-11-07 12:50:29 字数 1067 浏览 0 评论 0原文

我在 C# / VS2010 中使用 Parallel.ForEach 循环进行处理,我有几个问题。

首先,我有一个进程需要从远程 Web 服务中提取信息,然后需要动态构建图像 (GDI)。

我有一个类,它使用两个主要方法 Load() 和 CreateImage() 将所有功能封装到一个对象中,并将所有 GDI 管理/WebRequests“黑盒化”在该对象内。

然后,我创建一个包含所有需要处理的对象的 GenericList,并使用以下代码迭代该列表:

try
        {
            Parallel.ForEach(MyLGenericList, ParallelOptions, (MyObject, loopState) =>
            {                                       

                    MyObject.DoLoad();
                    MyObject.CreateImage();
                    MyObject.Dispose();

                if (loopState.ShouldExitCurrentIteration || loopState.IsExceptional)
                    loopState.Stop();
            });
        }
        catch (OperationCanceledException ex)
        {
            // Cancel here
        }
        catch (Exception ex)
        {
            throw ex;
        }

现在我的问题是:

  1. 鉴于列表中可能有一万个项目需要解析,上面的代码是解决这个问题的最佳方法是什么?任何其他想法都欢迎
  2. 我有一个问题,当我开始该过程时,对象被创建/加载并且图像创建得非常快,但在大约六百个对象之后,该过程开始爬行。最终还是没有完成,这正常吗?

提前致谢 :) 亚当

I am using a Parallel.ForEach loop in C# / VS2010 to do processing and I have a couple of questions.

First of all I have a process that needs to extract information from a remote webservice and then needs to build images (GDI) on the fly.

I have a class that encapsulates all of the functionality into a single object with two main methods Load() and CreateImage() with all the GDI management / WebRequests "blackboxed" inside this object.

I then create a GenericList that contains all the objects that need to be processed and I iterate through the list using the following code:

try
        {
            Parallel.ForEach(MyLGenericList, ParallelOptions, (MyObject, loopState) =>
            {                                       

                    MyObject.DoLoad();
                    MyObject.CreateImage();
                    MyObject.Dispose();

                if (loopState.ShouldExitCurrentIteration || loopState.IsExceptional)
                    loopState.Stop();
            });
        }
        catch (OperationCanceledException ex)
        {
            // Cancel here
        }
        catch (Exception ex)
        {
            throw ex;
        }

Now my questions are:

  1. Given that there could be ten thousand items in the list to parse, is the above code the best way to approach this? Any other ideas more then welcome
  2. I have an issue whereby when I start the process the objects are created / loaded and images created very fast but after around six hundred objects the process starts to crawl. It doesn eventually finish, is this normal?

Thanks in advance :)
Adam

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

ゞ记忆︶ㄣ 2024-11-14 12:50:29

我不确定并行下载数据是个好主意,因为它会阻塞很多线程。相反,将您的任务拆分为生产者和消费者。然后您可以分别并行化它们中的每一个。

这是单个生产者和多个消费者的示例。
(如果消费者比生产者更快,则可以使用普通的 foreach 而不是 parallel.ForEach)

var sources = BlockingCollection<SourceData>();
var producer = Task.Factory.CreateNew(
    () => {
        foreach (var item in MyGenericList) {
            var data = webservice.FetchData(item);
            sources.Add(data)
        }
        sources.CompleteAdding();
    }
)
Parallel.ForEach(sources.GetConsumingPartitioner(),
                 data => {
                     imageCreator.CreateImage(data);
                 });

(GetConsumingPartitioner 扩展是 ParallelExtensionsExtras)

编辑 更完整的示例

var sources = BlockingCollection<SourceData>();

var producerOptions = new ParallelOptions { MaxDegreeOfParallelism = 5 };
var consumerOptions = new ParallelOptions { MaxDegreeOfParallelism = -1 };

var producers = Task.Factory.CreateNew(
    () => {
        Parallel.ForEach(MyLGenericList, producerOptions, 
            myObject => {
                myObject.DoLoad()
                sources.Add(myObject)
            });
        sources.CompleteAdding();
    });
Parallel.ForEach(sources.GetConsumingPartitioner(), consumerOptions,
    myObject => {
        myObject.CreateImage();
        myObject.Dispose();
    });

使用此代码,您可以在保持 CPU 占用的同时优化并行下载量忙于图像处理。

I am not sure that downloading data in parallel is a good idea since it will block a lot of threads. Split your task into a producer and a consumer instead. Then you can parallelize each of them separately.

Here is an example of a single producer and multiple consumers.
(If the consumers are faster than the producer you can just use a normal foreach instead of parallel.ForEach)

var sources = BlockingCollection<SourceData>();
var producer = Task.Factory.CreateNew(
    () => {
        foreach (var item in MyGenericList) {
            var data = webservice.FetchData(item);
            sources.Add(data)
        }
        sources.CompleteAdding();
    }
)
Parallel.ForEach(sources.GetConsumingPartitioner(),
                 data => {
                     imageCreator.CreateImage(data);
                 });

(the GetConsumingPartitioner extension is part of the ParallelExtensionsExtras)

Edit A more complete example

var sources = BlockingCollection<SourceData>();

var producerOptions = new ParallelOptions { MaxDegreeOfParallelism = 5 };
var consumerOptions = new ParallelOptions { MaxDegreeOfParallelism = -1 };

var producers = Task.Factory.CreateNew(
    () => {
        Parallel.ForEach(MyLGenericList, producerOptions, 
            myObject => {
                myObject.DoLoad()
                sources.Add(myObject)
            });
        sources.CompleteAdding();
    });
Parallel.ForEach(sources.GetConsumingPartitioner(), consumerOptions,
    myObject => {
        myObject.CreateImage();
        myObject.Dispose();
    });

With this code you can optimize the amount of parallel downloads while keeping the cpu busy with the image processing.

傻比既视感 2024-11-14 12:50:29

当循环体执行的工作受 CPU 限制时,具有默认设置的 Parallel.ForEach 方法效果最佳。如果你同步阻塞或将工作移交给另一方,调度程序会认为CPU仍然不忙,并不断塞满更多任务,努力使用系统中的所有CPU。

在您的情况下,您只需选择合理数量的并行重叠下载,并在 ForEach 选项中设置该值,因为您不会让循环使 CPU 饱和。

The Parallel.ForEach method with the default settings works best when the work that the loop body does is CPU bound. If you are blocking or hand off the work to another party synchronously, the scheduler thinks that the CPU still isn't busy and keeps cramming more tasks, trying hard to use all the CPUs in the system.

In your case you need to just pick a reasonable number of overlapping downloads to occur in parallel and set that value in your ForEach options because you aren't going to saturate the CPUs with your loop.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文