过滤 IEnumerable 模式
考虑以下简单的代码模式:
foreach(Item item in itemList)
{
if(item.Foo)
{
DoStuff(item);
}
}
如果我想使用并行扩展(PE)对其进行并行化,我可以简单地替换 for 循环构造,如下所示:
Parallel.ForEach(itemList, delegate(Item item)
{
if(item.Foo)
{
DoStuff(item);
}
});
但是,PE 会执行不必要的工作,将 Foo 原来是的那些项目的工作分配给线程错误的。因此,我认为中间包装/过滤 IEnumerable 可能是一种合理的方法。你同意?如果是这样,实现这一目标的最简单方法是什么? (顺便说一句,我目前正在使用 C#2,所以我会很感激至少有一个不使用 lambda 表达式等的示例。)
Consider the following simple code pattern:
foreach(Item item in itemList)
{
if(item.Foo)
{
DoStuff(item);
}
}
If I want to parallelize it using Parallel Extensions(PE) I might simply replace the for loop construct as follows:
Parallel.ForEach(itemList, delegate(Item item)
{
if(item.Foo)
{
DoStuff(item);
}
});
However, PE will perform unnecessary work assigning work to threads for those items where Foo turned out to be false. Thus I was thinking an intermediate wrapper/filtering IEnumerable might be a reasonable approach here. Do you agree? If so what is the simplest way of achieving this? (BTW I'm currently using C#2, so I'd be grateful for at least one example that doesn't use lambda expressions etc.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我不确定 .NET 2 的 PE 分区是如何工作的,所以很难说。如果每个元素都被推入一个单独的工作项(这将是一个相当糟糕的分区策略),那么提前过滤将非常有意义。
然而,如果
item.Foo
碰巧非常昂贵(我不希望这样,因为它是一个属性,但它总是可能的),允许它并行化可能是有利的。此外,在 .NET 4 中,TPL 使用的分区策略可以很好地处理这个问题。它是专门为处理不同工作级别的情况而设计的。它以“块”的形式进行分区,因此一个项目不会发送到一个线程,而是为一个线程分配一组项目,并对其进行批量处理。根据
item.Foo
为 false 的频率,并行化(使用 TPL)很可能比提前过滤更快。I'm not sure how the partitioning in PE for .NET 2 works, so it's difficult to say there. If each element is being pushed into a separate work item (which would be a fairly poor partitioning strategy), then filtering in advance would make quite a bit of sense.
If, however,
item.Foo
happened to be at all expensive (I wouldn't expect this, given that it's a property, but it's always possible), allowing it to be parallelized could be advantageous.In addition, in .NET 4, the partitioning strategy used by the TPL will handle this fairly well. It was specifically designed to handle situations with varying levels of work. It does partitioning in "chunks", so one item does not get sent to one thread, but rather a thread gets assigned a set of items, which it processes in bulk. Depending on the frequency of
item.Foo
being false, paralellizing (using TPL) would quite possibly be faster than filtering in advance.所有因素都归结为这一行:
但是阅读另一篇文章的评论时,我现在看到您还处于 .Net 2.0 中,因此其中一些内容可能有点难以通过编译器。
对于.Net 2.0,我认为你可以这样做(我有点不清楚将方法名称作为委托传递仍然有效,但我认为它会):
That all factors down to this single line:
But reading a comment to another post I now see you're in .Net 2.0 yet, so some of this may be a bit tricky to sneak past the compiler.
For .Net 2.0, I think you can do it like this (I'm a little unclear that passing the method names as delegates will still just work, but I think it will):
如果我要实现这个,我只需在调用 foreach 之前过滤列表。
这将过滤列表以获取可以执行操作的项目。
注意:这可能是一个不成熟的优化,并且不会对您的性能产生重大影响。
If I was to implement this, I would simply filter the list, before calling the foreach.
This will filter the list to get the items that can be acted upon.
NOTE: this might be a pre-mature optimization, and could not make a major difference in your performance.