FindAll 与Where 扩展方法

发布于 2024-08-06 21:52:11 字数 220 浏览 6 评论 0原文

我只想知道“FindAll”是否比“Where”扩展方法更快，为什么？

示例：

myList.FindAll(item=> item.category == 5);

或

myList.Where(item=> item.category == 5);

哪个更好？

原文

I just want know if a "FindAll" will be faster than a "Where" extentionMethod and why?

Example :

myList.FindAll(item=> item.category == 5);

myList.Where(item=> item.category == 5);

Which is better ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

月光色 2024-08-13 21:52:11

好吧，FindAll 将匹配的元素复制到一个新列表，而 Where 仅返回一个延迟计算的序列 - 不需要复制。

因此，我希望 Where 比 FindAll 稍快，即使结果序列已完全评估 - 当然还有 Where 的惰性评估策略> 意味着如果您只查看（比如说）第一个匹配项，则不需要检查列表的其余部分。（正如 Matthew 指出的，维护 Where 的状态机是有工作的。但是，这只会有固定的内存成本 - 而构造一个新列表可能需要多个数组分配等。）

基本上，< code>FindAll(predicate) 更接近 Where(predicate).ToList() 而不是 Where(predicate)。

只是为了对马修的回答做出更多反应，我认为他没有足够彻底地测试它。他的谓词恰好选择了一半的项目。这是一个简短但完整的程序，它测试相同的列表，但使用三个不同的谓词 - 一个不选择任何项目，一个选择所有项目，一个选择其中的一半。在每种情况下，我都运行测试五十次以获得更长的时间。

我使用 Count() 来确保完全评估 Where 结果。结果显示，收集了大约一半的结果，两者并驾齐驱。如果未收集到任何结果，FindAll 获胜。收集所有结果，Where获胜。我发现这很有趣：随着找到越来越多的匹配项，所有解决方案都变得更慢：FindAll 有更多的复制要做，而 Where 必须返回匹配的值而不是只是在 MoveNext() 实现中循环。然而，FindAll 的速度比 Where 慢，因此失去了早期的领先优势。非常有趣。

结果：（

FindAll: All: 11994
Where: All: 8176
FindAll: Half: 6887
Where: Half: 6844
FindAll: None: 3253
Where: None: 4891

使用 /o+ /debug- 编译并从命令行运行，.NET 3.5。）

代码：

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

class Test
{
    static List<int> ints = Enumerable.Range(0, 10000000).ToList();

    static void Main(string[] args)
    {
        Benchmark("All", i => i >= 0); // Match all
        Benchmark("Half", i => i % 2 == 0); // Match half
        Benchmark("None", i => i < 0); // Match none
    }

    static void Benchmark(string name, Predicate<int> predicate)
    {
        // We could just use new Func<int, bool>(predicate) but that
        // would create one delegate wrapping another.
        Func<int, bool> func = (Func<int, bool>) 
            Delegate.CreateDelegate(typeof(Func<int, bool>), predicate.Target,
                                    predicate.Method);
        Benchmark("FindAll: " + name, () => ints.FindAll(predicate));
        Benchmark("Where: " + name, () => ints.Where(func).Count());
    }

    static void Benchmark(string name, Action action)
    {
        GC.Collect();
        Stopwatch sw = Stopwatch.StartNew();
        for (int i = 0; i < 50; i++)
        {
            action();
        }
        sw.Stop();
        Console.WriteLine("{0}: {1}", name, sw.ElapsedMilliseconds);
    }
}

Well, FindAll copies the matching elements to a new list, whereas Where just returns a lazily evaluated sequence - no copying is required.

I'd therefore expect Where to be slightly faster than FindAll even when the resulting sequence is fully evaluated - and of course the lazy evaluation strategy of Where means that if you only look at (say) the first match, it won't need to check the remainder of the list. (As Matthew points out, there's work in maintaining the state machine for Where. However, this will only have a fixed memory cost - whereas constructing a new list may require multiple array allocations etc.)

Basically, FindAll(predicate) is closer to Where(predicate).ToList() than to just Where(predicate).

Just to react a bit more to Matthew's answer, I don't think he's tested it quite thoroughly enough. His predicate happens to pick half the items. Here's a short but complete program which tests the same list but with three different predicates - one picks no items, one picks all the items, and one picks half of them. In each case I run the test fifty times to get longer timing.

I'm using Count() to make sure that the Where result is fully evaluated. The results show that collecting around half the results, the two are neck and neck. Collecting no results, FindAll wins. Collecting all the results, Where wins. I find this intriguing: all of the solutions become slower as more and more matches are found: FindAll has more copying to do, and Where has to return the matched values instead of just looping within the MoveNext() implementation. However, FindAll gets slower faster than Where does, so loses its early lead. Very interesting.

Results:

FindAll: All: 11994
Where: All: 8176
FindAll: Half: 6887
Where: Half: 6844
FindAll: None: 3253
Where: None: 4891

(Compiled with /o+ /debug- and run from the command line, .NET 3.5.)

Code:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

class Test
{
    static List<int> ints = Enumerable.Range(0, 10000000).ToList();

    static void Main(string[] args)
    {
        Benchmark("All", i => i >= 0); // Match all
        Benchmark("Half", i => i % 2 == 0); // Match half
        Benchmark("None", i => i < 0); // Match none
    }

    static void Benchmark(string name, Predicate<int> predicate)
    {
        // We could just use new Func<int, bool>(predicate) but that
        // would create one delegate wrapping another.
        Func<int, bool> func = (Func<int, bool>) 
            Delegate.CreateDelegate(typeof(Func<int, bool>), predicate.Target,
                                    predicate.Method);
        Benchmark("FindAll: " + name, () => ints.FindAll(predicate));
        Benchmark("Where: " + name, () => ints.Where(func).Count());
    }

    static void Benchmark(string name, Action action)
    {
        GC.Collect();
        Stopwatch sw = Stopwatch.StartNew();
        for (int i = 0; i < 50; i++)
        {
            action();
        }
        sw.Stop();
        Console.WriteLine("{0}: {1}", name, sw.ElapsedMilliseconds);
    }
}

回复收藏 0 原文

つ低調成傷 2024-08-13 21:52:11

我们测试而不是猜测怎么样？很遗憾看到错误的答案被出来。

var ints = Enumerable.Range(0, 10000000).ToList();
var sw1 = Stopwatch.StartNew();
var findall = ints.FindAll(i => i % 2 == 0);
sw1.Stop();

var sw2 = Stopwatch.StartNew();
var where = ints.Where(i => i % 2 == 0).ToList();
sw2.Stop();

Console.WriteLine("sw1: {0}", sw1.ElapsedTicks);
Console.WriteLine("sw2: {0}", sw2.ElapsedTicks);
/*
Debug
sw1: 1149856
sw2: 1652284

Release
sw1: 532194
sw2: 1016524
*/

编辑：

即使我将上面的代码从

var findall = ints.FindAll(i => i % 2 == 0);
...
var where = ints.Where(i => i % 2 == 0).ToList();

...转为...，

var findall = ints.FindAll(i => i % 2 == 0).Count;
...
var where = ints.Where(i => i % 2 == 0).Count();

我也会得到这些结果

/*
Debug
sw1: 1250409
sw2: 1267016

Release
sw1: 539536
sw2: 600361
*/

编辑2.0...

如果您想要一个子集的列表当前列出最快的方法是FindAll()。原因很简单。 FindAll 实例方法使用当前 List 上的索引器而不是枚举器状态机。 Where() 扩展方法是对使用枚举器的不同类的外部调用。如果从列表中的每个节点步进到下一个节点，则必须在幕后调用 MoveNext() 方法。正如您从上面的示例中看到的，使用索引条目创建新列表（指向原始项目，因此内存膨胀将最小）甚至只获取过滤项目的计数甚至更快。

现在，如果您要从枚举器提前中止，Where() 方法可能会更快。当然，如果您将早期中止逻辑移至 FindAll() 方法的谓词，您将再次使用索引器而不是枚举器。

现在还有其他原因需要使用Where() 语句（例如其他linq 方法、foreach 块等等），但问题是FindAll() 比Where() 更快。除非您不执行Where()，否则答案似乎是肯定的。（在比较苹果与苹果时）

我并不是说不要使用 LINQ 或 .Where() 方法。它们使代码更易于阅读。问题是关于性能，而不是关于阅读和理解代码的难易程度。快速完成这项工作的最快方法是使用 for 块步进每个索引并根据需要执行任何逻辑（甚至提前退出）。 LINQ 之所以如此出色，是因为您可以使用它们获得复杂的表达式树和转换。但是使用 .Where() 方法中的迭代器必须执行大量代码才能找到内存状态机的路径，该状态机只是从列表中获取下一个索引。还应该注意的是，这个 .FindAll() 方法仅对实现它的对象（例如 Array 和 List）有用。

还有更多...

for (int x = 0; x < 20; x++)
{
    var ints = Enumerable.Range(0, 10000000).ToList();
    var sw1 = Stopwatch.StartNew();
    var findall = ints.FindAll(i => i % 2 == 0).Count;
    sw1.Stop();

    var sw2 = Stopwatch.StartNew();
    var where = ints.AsEnumerable().Where(i => i % 2 == 0).Count();
    sw2.Stop();

    var sw4 = Stopwatch.StartNew();
    var cntForeach = 0;
    foreach (var item in ints)
        if (item % 2 == 0)
            cntForeach++; 
    sw4.Stop();

    Console.WriteLine("sw1: {0}", sw1.ElapsedTicks);
    Console.WriteLine("sw2: {0}", sw2.ElapsedTicks);
    Console.WriteLine("sw4: {0}", sw4.ElapsedTicks);
}


/* averaged results
sw1 575446.8
sw2 605954.05
sw3 394506.4
/*

How about we test instead of guess? Shame to see the wrong answer get out.

var ints = Enumerable.Range(0, 10000000).ToList();
var sw1 = Stopwatch.StartNew();
var findall = ints.FindAll(i => i % 2 == 0);
sw1.Stop();

var sw2 = Stopwatch.StartNew();
var where = ints.Where(i => i % 2 == 0).ToList();
sw2.Stop();

Console.WriteLine("sw1: {0}", sw1.ElapsedTicks);
Console.WriteLine("sw2: {0}", sw2.ElapsedTicks);
/*
Debug
sw1: 1149856
sw2: 1652284

Release
sw1: 532194
sw2: 1016524
*/

Edit:

Even if I turn the above code from

var findall = ints.FindAll(i => i % 2 == 0);
...
var where = ints.Where(i => i % 2 == 0).ToList();

... to ...

var findall = ints.FindAll(i => i % 2 == 0).Count;
...
var where = ints.Where(i => i % 2 == 0).Count();

I get these results

/*
Debug
sw1: 1250409
sw2: 1267016

Release
sw1: 539536
sw2: 600361
*/

Edit 2.0...

If you want a list of the subset of the current list the fastest method if the FindAll(). The reason for this is simple. The FindAll instance method uses the indexer on the current List instead of the enumerator state machine. The Where() extension method is an external call to a different class that uses the enumerator. If you step from each node in the list to the next node you will have to call the MoveNext() method under the covers. As you can see from the above examples it is even faster to use the index entries to create a new list (that is pointing to the original items, so memory bloat will be minimal) to even just get a count of the filtered items.

Now if you are going to early abort from the Enumerator the Where() method could be faster. Of course if you move the early abort logic to the predicate of the FindAll() method you will again be using the indexer instead of the enumerator.

Now there are other reasons to use the Where() statement (such as the other linq methods, foreach blocks and many more) but the question was is the FindAll() faster than Where(). And unless you don't execute the Where() the answer seems to be yes. (When comparing apples to apples)

I am not say don't use LINQ or the .Where() method. They make for code that is much simpler to read. The question was about performance and not about how easy you can read and understand the code. By fast the fastest way to do this work would be to use a for block stepping each index and doing any logic as you want (even early exits). The reason LINQ is so great is becasue of the complex expression trees and transformation you can get with them. But using the iterator from the .Where() method has to go though tons of code to find it's way to a in memory statemachine that is just getting the next index out of the List. It should also be noted that this .FindAll() method is only useful on objects that implmented it (such as Array and List.)

Yet more...

for (int x = 0; x < 20; x++)
{
    var ints = Enumerable.Range(0, 10000000).ToList();
    var sw1 = Stopwatch.StartNew();
    var findall = ints.FindAll(i => i % 2 == 0).Count;
    sw1.Stop();

    var sw2 = Stopwatch.StartNew();
    var where = ints.AsEnumerable().Where(i => i % 2 == 0).Count();
    sw2.Stop();

    var sw4 = Stopwatch.StartNew();
    var cntForeach = 0;
    foreach (var item in ints)
        if (item % 2 == 0)
            cntForeach++; 
    sw4.Stop();

    Console.WriteLine("sw1: {0}", sw1.ElapsedTicks);
    Console.WriteLine("sw2: {0}", sw2.ElapsedTicks);
    Console.WriteLine("sw4: {0}", sw4.ElapsedTicks);
}


/* averaged results
sw1 575446.8
sw2 605954.05
sw3 394506.4
/*

回复收藏 0 原文

遗失的美好 2024-08-13 21:52:11

好吧，至少你可以尝试测量它。

静态 Where 方法是使用迭代器块（yield 关键字）实现的，这基本上意味着执行将被推迟。如果仅比较对这两个方法的调用，第一个方法会更慢，因为它立即意味着将迭代整个集合。

但如果您包含所获得结果的完整迭代，情况可能会有所不同。我很确定 yield 解决方案速度较慢，因为它暗示了生成的状态机机制。（参见@Matthew anwser）

回复收藏 0 原文

抠脚大汉 2024-08-13 21:52:11

我可以提供一些线索，但不确定哪一个更快。
FindAll() 立即执行。
其中()被推迟执行。

回复收藏 0 原文

甜警司 2024-08-13 21:52:11

where的好处是延迟执行。如果您具有以下功能，请查看差异：

BigSequence.FindAll( x =>  DoIt(x) ).First();
BigSequence.Where( x => DoIt(x) ).First();

FindAll 已覆盖整个序列，而在大多数序列中，Where 会在找到一个元素后立即停止枚举。

使用 Any()、Take()、Skip() 等会产生相同的效果。我不确定，但我想您在所有具有延迟执行的函数中都会有巨大的优势

The advantage of where is the deferred execution. See the difference if you'd have the following functionality

BigSequence.FindAll( x =>  DoIt(x) ).First();
BigSequence.Where( x => DoIt(x) ).First();

FindAll has covered the complete sequene, while Where in most sequences will stop enumerating as soon as one element is found.

The same effects will be one using Any(), Take(), Skip(), etc. I'm not sure, but I guess you'll have huge advantages in all functions that have deferred execution

回复收藏 0 原文

~没有更多了~