Linq to Objects:过滤性能问题

发布于 2024-09-27 09:50:54 字数 512 浏览 4 评论 0原文

我正在考虑 linq 的计算方式,这让我想知道:

如果我写的

var count = collection.Count(o => o.Category == 3);

执行方式会与以下内容有所不同:

var count = collection.Where(o => o.Category == 3).Count();

毕竟, IEnumerable.Where() 将返回 IEnumerable; 没有实现 Count 属性,因此后续的 Count() 实际上必须遍历这些项目来确定计数,这会导致额外的时间都花在这上面了。

我编写了一些快速测试代码来获取一些指标,但它们似乎随机地互相击败。我最初不会在这里输入测试代码,但如果有人请求,我会输入它。

那么,我是否遗漏了一些东西?

I was thinking about the way linq computes and it made me wonder:

If I write

var count = collection.Count(o => o.Category == 3);

Will that perform any differently than:

var count = collection.Where(o => o.Category == 3).Count();

After all, IEnumerable<T>.Where() will return IEnumerable<T> which doesn't implement Count property, so a subsequent Count() would actually have to walk through the items to determine the count which should cause extra time being spent on this.

I wrote some quick test code to get some metrics but they seem to beat each other at random. I won't put in the test code here initially, but if anyone requests, I'll get it in.

So, am I missing something?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

哽咽笑 2024-10-04 09:50:54

实际上,其中不会有太多内容 - 两种形式都会迭代集合,检查每个项目的谓词,并计算匹配项。两种方法都会传输数据 - 例如,它不像 Where 实际上构建所有匹配的内存列表。

第一种形式的间接层较少(薄),仅此而已。使用它的主要原因(IMO)是为了可读性/简单性,而不是性能。

There won't be a lot in it, really - both forms will iterate over the collection, check the predicate against each item, and count the matches. Both approaches will stream the data - it's not like Where is actually building an in-memory list of all matches, for example.

The first form has one fewer (thin) layer of indirection in, that's all. The main reason for using it (IMO) is for readability/simplicity, rather than performance.

倾城°AllureLove 2024-10-04 09:50:54

正如乔恩·斯基特(Jon Skeet)所说,两种技术本质上都必须做同样的事情——枚举序列,同时在谓词匹配时有条件地增加计数器。两者之间的任何性能差异都应该很小:对于几乎所有用例来说都是微不足道的。如果有一个令牌获胜者,我会认为它应该是第一个,因为从反射器看来,采用谓词的Count重载使用了它自己的< code>foreach 进行枚举,而不是使用更明显的方式将工作卸载到流式 aWhere无参数Countas在你的第二个例子中。这意味着技术#1可能有两个次要性能优势:

  1. 更少的参数验证(空测试等)检查。技术 #2 的 Count 还将检查其(管道)输入是否是 ICollectionICollection ,但不可能是这样。
  2. 单个构造的枚举器与通过管道连接在一起的两个枚举器(额外的状态机有成本)。

不过,有一个小问题支持技术#2:Where在为源序列构建枚举器方面稍微复杂一些;它对列表和数组使用不同的列表。这可能会使其在某些情况下具有更高的性能。

当然,我应该重申,我的分析可能是完全错误的 - 通过静态代码分析来推理性能,尤其是当差异可能很小时,不是一个好主意。只有一种方法可以找到答案 - 测量特定设置的执行时间。

仅供参考,我反映的来源来自.NET 3.5 SP1。

As Jon Skeet says, both techniques will have to essentially do the same thing - enumerate the sequence while conditionally incrementing a counter when the predicate is matched. Any performance differences between the two should be slight: insignificant for almost all use-cases. If there is a token winner though, I would think it should be the first one, since from reflector it appears that the overload ofCountthat takes a predicate uses its ownforeachto enumerate rather than the more obvious way of offloading the work to a streaming aWhereinto a parameterlessCountas in your second example. This means technique #1 is likely to have two minor performance benefits:

  1. Fewer argument validation (null-tests etc.) checks. Technique #2's Count will also check if its (piped) input is an ICollection or ICollection<T> , which it can't possibly be.
  2. A single constructed enumerator vs two enumerators piped together (an additional state-machine has costs).

There is one minor in favour of technique #2 point though:Whereis slightly more sophisticated in constructing an enumerator for the source-sequence; it uses a different one for lists and arrays. This may make it more performant in certain scenarios.

Of course, I should reiterate that I might be plain wrong about my analysis - reasoning about performance through static code analysis, especially when the differences are likely to be slight, is not a good idea. There is only one way to find out - measuring the execution times for your specific setup.

FYI, the source I reflected was from .NET 3.5 SP1.

苍暮颜 2024-10-04 09:50:54

我知道你在想什么。至少,我想我是这么做的; Count() 将查看 Count 是否可用作属性,如果是,则简单地返回该属性。否则,它必须枚举项目才能获取返回值。

然而,接受谓词的 Count() 版本总是会导致集合被迭代,因为它必须这样做才能看到哪些匹配。

I know what you are thinking here. At least, I think I do; Count() will look to see if Count is available as a property, and will simply return that if so. Otherwise, it has to enumerate the items to get its return value.

The version of Count() which accepts the predicate, though, will always cause the collection to be iterated, since it has to do it to see which ones match.

手长情犹 2024-10-04 09:50:54

上面的答案提出了很好的观点,另请考虑,如果您脱离任何延迟执行的 Linq-To-X 实现(Linq to Sql 是主要的),这些方法中使用的表达式参数可能会导致不同的结果。

Above answers make good points, consider also that if you break away into any Linq-To-X implementations that deferred execution (Linq to Sql being the primary), the Expression parameters used in these methods may cause different results.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文