我正在尝试使用新的 C# 4.0 Parallel.ForEach 函数在对象列表上执行并行函数。这是一个非常漫长的维护过程。我想让它按照列表的顺序执行,以便我可以在上一点停止并继续执行。我该怎么做?
这是一个例子。我有一个对象列表:a1 到 a100
。这是当前的顺序:
a1, a51, a2, a52, a3, a53...
我想要这个顺序:
a1, a2, a3, a4...
我可以接受某些对象无序运行,但只要我可以在列表中找到一个点,我可以说该点之前的所有对象都已运行。我读了并行编程csharp白皮书,但没有看到任何相关内容。 ParallelOptions
类中没有对此进行设置。
I am trying to execute parallel functions on a list of objects using the new C# 4.0 Parallel.ForEach
function. This is a very long maintenance process. I would like to make it execute in the order of the list so that I can stop and continue execution at the previous point. How do I do this?
Here is an example. I have a list of objects: a1 to a100
. This is the current order:
a1, a51, a2, a52, a3, a53...
I want this order:
a1, a2, a3, a4...
I am OK with some objects being run out of order, but as long as I can find a point in the list where I can say that all objects before this point were run. I read the parallel programming csharp whitepaper and didn't see anything about it. There isn't a setting for this in the ParallelOptions
class.
发布评论
评论(7)
执行如下操作:
假设您让所有线程在中断之前完成,您可以看到当您退出并行 for 循环时,您将如何知道要执行的最后一个列表项。我不太喜欢 PLINQ 或 LINQ。老实说,我不明白编写 LINQ/PLINQ 如何带来可维护的源代码或可读性...... Parallel.For 是一个更好的解决方案。
Do something like this:
You can see how when you break out of the parallel for loop you will know the last list item to be executed, assuming you let all threads finish prior to breaking. I'm not a big fan of PLINQ or LINQ. I honestly don't see how writing LINQ/PLINQ leads to maintainable source code or readability.... Parallel.For is a much better solution.
如果您使用 Parallel.Break 终止循环,则可以保证低于返回值的所有索引都将被执行。这大约是您能得到的最接近的结果。此处的示例使用 For,但 ForEach 具有类似的重载。
在 ForEach 循环中,会在内部为每个分区中的每个元素生成迭代索引。执行是无序的,但在中断之后,您知道所有低于
LowestBreakIteration
的迭代都将完成。摘自“使用 Microsoft .NET 进行并行编程” http://parallelpatterns.codeplex.com/
可在 MSDN 上获取。请参阅 http://msdn.microsoft.com/en-us/library/ff963552 .aspx。 “尽早跳出循环”部分介绍了这种情况。
另请参阅:http://msdn.microsoft.com/en-us/library /dd460721.aspx
If you use
Parallel.Break
to terminate the loop then you are guarenteed that all indices below the returned value will have been executed. This is about as close as you can get. The example here uses For but ForEach has similar overloads.In a ForEach loop, an iteration index is generated internally for each element in each partition. Execution takes place out of order but after break you know that all the iterations lower than
LowestBreakIteration
will have been completed.Taken from "Parallel Programming with Microsoft .NET" http://parallelpatterns.codeplex.com/
Available on MSDN. See http://msdn.microsoft.com/en-us/library/ff963552.aspx. The section "Breaking out of loops early" covers this scenario.
See also: http://msdn.microsoft.com/en-us/library/dd460721.aspx
对于遇到这个问题的其他人 - 如果您正在循环数组或列表(而不是 IEnumberable ),您可以使用 Parallel.Foreach 的重载,它也提供元素索引来维持原始顺序。
For anyone else who comes across this question - if you're looping over an array or list (rather than an IEnumberable ), you can use the overload of Parallel.Foreach that gives the element index to maintain original order too.
作为替代建议,您可以记录已运行的对象,然后在恢复执行时过滤列表以排除已运行的对象。
如果这需要在应用程序重新启动时保持不变,您可以存储已执行的对象的 ID(我假设这里的对象有一些唯一的标识符)。
As an alternate suggestion, you could record which object have been run and then filter the list when you resume exection to exclude the objects which have already run.
If this needs to be persistent across application restarts, you can store the ID's of the already executed objects (I assume here the objects have some unique identifier).
对于任何寻找简单解决方案的人,我发布了 2 种扩展方法(一种使用 PLINQ,一种使用
Parallel.ForEach
)作为以下问题答案的一部分:订购 PLINQ ForAll
For anybody looking for a simple solution, I have posted 2 extension methods (one using PLINQ and one using
Parallel.ForEach
) as part of an answer to the following question:Ordered PLINQ ForAll
您似乎正在寻找一个从第一项开始的分区器,一次获取一个项目,直到所有项目都被消耗掉。从 .NET Framework 4.5 开始,标准 .NET 库中已经存在这样的分区器。您只需使用
分区器即可。 Create
方法,使用EnumerablePartitionerOptions.NoBuffering
选项:每次工作线程完成某个项目的处理时,它都会从
source
集合中获取下一个可用项目。这确保了项目将按大致升序处理。您可以找到 此处有关Parallel
API 使用的各种分区策略(范围分区、块分区等)的更多信息)。如果您想提前停止并行执行,一种想法是调用
state.Break()
。完成 Parallel.ForEach 后,您可以检查属性ParallelLoopResult.LowestBreakIteration
,查看是否调用了Break
,以及最低的index< /code> 调用了它。
另一个想法是拒绝分区程序对
source
集合元素的无限制访问。作为停止机制,您可以使用CancellationTokenSource
,并作为断路器
TakeWhile
LINQ 运算符:...然后将
limited
传递给Partitioner.Create
。调用 cts.Cancel() 停止处理。第三个想法是使用
ParallelOptions.CancellationToken
属性:It seems that you are searching for a partitioner that starts from the first item, and takes one item at a time until all the items are consumed. Starting from .NET Framework 4.5, such a partitioner already exists in the standard .NET libraries. You can simply use the
Partitioner.Create
method, configured with theEnumerablePartitionerOptions.NoBuffering
option:Each time a worker thread completes the processing of an item, it will take the next available item from the
source
collection. This ensures that the items will be processed in roughly ascending order. You can find here more info about the various partitioning strategies employed by theParallel
APIs (range partitioning, chunk partitioning etc).If you want to stop prematurely the parallel execution, one idea is to call
state.Break()
inside the lambda. After the completion of theParallel.ForEach
you can examine the propertyParallelLoopResult.LowestBreakIteration
, and see if theBreak
was called, and what was the lowestindex
that called it.Another idea is to deny the partitioner from unlimited access to the elements of the
source
collection. As a stopping mechanism you could use aCancellationTokenSource
, and as a circuit breaker theTakeWhile
LINQ operator:...and then pass the
limited
to thePartitioner.Create
. Callcts.Cancel()
to stop the processing.A third idea is to use the
ParallelOptions.CancellationToken
property:不确定问题是否被改变,因为我的评论似乎是错误的。
这里改进了,基本上提醒并行作业的运行超出了您的控制顺序。
ea 打印 10 个数字可能会得到 1,4,6,7,2,3,9,0。
如果您想停止程序并稍后继续。
类似的问题通常会出现在批处理工作负载中。
并记录所做的事情。
假设您是否必须检查 10.000 个数字是否为素数。
您可以按大小为 100 的批次进行循环,并有素数 log1、log2、log3
log1= 0..99
log2=100..199
请务必设置一些标记来了解批处理作业是否已完成。
这是一个笼统的说法,因为问题也不那么准确。
Not sure if question was altered as my comment seems wrong.
Here improved, basically remind that parallel jobs run in out of your control order.
ea printing 10 numbers might result in 1,4,6,7,2,3,9,0.
If you like to stop your program and continue later.
Problems alike this usually endup in batching workloads.
And have some logging of what was done.
Say if you had to check 10.000 numbers for prime or so.
You could loop in batches of size 100, and have a prime log1, log2, log3
log1= 0..99
log2=100..199
Be sure to set some marker to know if a batch job was finished.
Its a general aprouch since the question isnt that exact either.