LINQ 查询性能与紧凑查询

发布于 2024-11-04 02:51:42 字数 1905 浏览 1 评论 0原文

在检查 LINQ 的功能时，我编写了简单的 QuickSort 实现和
很高兴最终快速排序功能适合一行。
然而我注意到这个“一行”函数的性能与我原来的“直接”版本有很大不同。

下面是循环调用快速排序函数 10 次的代码：

var r = new Random(DateTime.Now.Millisecond);
Stopwatch watch = new Stopwatch();
for (int i = 0; i < 10; i++)
{
    watch.Reset();
    var randomA = new int[100].Select(x => r.Next(100)).ToList();

    watch.Start();                
    var sorted = QuickSort<int>(randomA);
    watch.Stop();

    Console.WriteLine("Duration: {0} ms", watch.ElapsedMilliseconds);
}

Console.ReadLine();

以及输出结果的 QuickSort 函数的两个实现：

简单版本：

IEnumerable<T> QuickSort<T>(IEnumerable<T> a) where T : IComparable<T>
{
    if (a.Count() <= 1) return a;

    var pivot = a.First();
    IEnumerable<T> lesser = a.Skip(1).Where(x => x.CompareTo(pivot) < 0);
    IEnumerable<T> bigger = a.Skip(1).Where(x => x.CompareTo(pivot) >= 0);

    return QuickSort(lesser).Concat(new T[] { pivot }).Concat(QuickSort(bigger));
}

输出
持续时间：22 毫秒
持续时间：3 毫秒
持续时间：3 毫秒
持续时间：3 毫秒
持续时间：3 毫秒
持续时间：3 毫秒
持续时间：2 毫秒
持续时间：2 毫秒
持续时间：3 毫秒
持续时间：3 ms

单行实现：

IEnumerable<T> QuickSort<T>(IEnumerable<T> a) where T : IComparable<T>
{
    return a.Count() <= 1 ? a :                 
        QuickSort(a.Skip(1).Where(i => i.CompareTo(a.First()) < 0)).
        Concat(new T[] { a.First() }).                          
        Concat(QuickSort(a.Skip(1).Where(i => i.CompareTo(a.First()) >= 0)));  
}

输出
持续时间：24154 毫秒
持续时间：407 毫秒
持续时间：2281 毫秒
持续时间：2420 毫秒
持续时间：919 毫秒
持续时间：48777 毫秒
持续时间：4615 毫秒
持续时间：3115 毫秒
持续时间：1311 毫秒
持续时间：1631 ms

为什么性能差异如此之大？

原文

While checking LINQ's abilities I've wrote simple QuickSort implementation and
was glad that ultimately quick sort function fits in one line.
However I've noticed that performance of this "one line" function significantly differ from my original "straight-forward" version.

Here is the code that calls quick sort function in a loop 10 times:

var r = new Random(DateTime.Now.Millisecond);
Stopwatch watch = new Stopwatch();
for (int i = 0; i < 10; i++)
{
    watch.Reset();
    var randomA = new int[100].Select(x => r.Next(100)).ToList();

    watch.Start();                
    var sorted = QuickSort<int>(randomA);
    watch.Stop();

    Console.WriteLine("Duration: {0} ms", watch.ElapsedMilliseconds);
}

Console.ReadLine();

And two implementations of QuickSort function with Output results:

Simple version:

IEnumerable<T> QuickSort<T>(IEnumerable<T> a) where T : IComparable<T>
{
    if (a.Count() <= 1) return a;

    var pivot = a.First();
    IEnumerable<T> lesser = a.Skip(1).Where(x => x.CompareTo(pivot) < 0);
    IEnumerable<T> bigger = a.Skip(1).Where(x => x.CompareTo(pivot) >= 0);

    return QuickSort(lesser).Concat(new T[] { pivot }).Concat(QuickSort(bigger));
}

Output
Duration: 22 ms
Duration: 3 ms
Duration: 3 ms
Duration: 3 ms
Duration: 3 ms
Duration: 3 ms
Duration: 2 ms
Duration: 2 ms
Duration: 3 ms
Duration: 3 ms

One Line implementation:

IEnumerable<T> QuickSort<T>(IEnumerable<T> a) where T : IComparable<T>
{
    return a.Count() <= 1 ? a :                 
        QuickSort(a.Skip(1).Where(i => i.CompareTo(a.First()) < 0)).
        Concat(new T[] { a.First() }).                          
        Concat(QuickSort(a.Skip(1).Where(i => i.CompareTo(a.First()) >= 0)));  
}

Output
Duration: 24154 ms
Duration: 407 ms
Duration: 2281 ms
Duration: 2420 ms
Duration: 919 ms
Duration: 48777 ms
Duration: 4615 ms
Duration: 3115 ms
Duration: 1311 ms
Duration: 1631 ms

Why such a big difference in the performance?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

￡噩梦荏苒 2024-11-11 02:51:42

原因是您每次都重新计算 a.First() - 如果您像这样再次将其分解：

public static IEnumerable<T> QuickSort<T>(IEnumerable<T> a) 
    where T : IComparable<T>
{
    if (a.Count() <= 1) return a;
    var pivot = a.First();
    return QuickSort(a.Skip(1).Where(i => i.CompareTo(pivot) < 0))
        .Concat(new T[] { pivot })
        .Concat(QuickSort(a.Skip(1).Where(i => i.CompareTo(pivot) >= 0)));
}

那么性能是相同的。

为了通过实验来澄清 - 使用拦截器和 First() 的（简单的，只是为了这个实验，不要在家里使用它，这是错误的）实现，我们可以看到实际上有多少次调用 First()。

public static class TestHelper
{
    public static int firstCalledCounter = 0;
    public static TSource First<TSource>(this IEnumerable<TSource> source) 
    {
        firstCalledCounter++;
        Console.WriteLine("First called " + firstCalledCounter);
        IList<TSource> list = source as IList<TSource>;
        if (list != null)
        {
            if (list.Count > 0)
            {
                return list[0];
            }
            else return default(TSource);
        }
        else return default(TSource);
    }
}

这表明，在“单行”中 a.First() 被调用 236376 次，而在我们分解 a.First()< 的版本中/code> 它被调用98次。这解释了性能差异。

The reason is that you are re-calculating a.First() everytime - if you factor this out again like this:

public static IEnumerable<T> QuickSort<T>(IEnumerable<T> a) 
    where T : IComparable<T>
{
    if (a.Count() <= 1) return a;
    var pivot = a.First();
    return QuickSort(a.Skip(1).Where(i => i.CompareTo(pivot) < 0))
        .Concat(new T[] { pivot })
        .Concat(QuickSort(a.Skip(1).Where(i => i.CompareTo(pivot) >= 0)));
}

Then the performance is the same.

To clarify with an experiment - using an interceptor with a (trivialized, just for this experiment, don't use this at home, it's wrong) implementation of First() we can see how many times in fact First() is called.

public static class TestHelper
{
    public static int firstCalledCounter = 0;
    public static TSource First<TSource>(this IEnumerable<TSource> source) 
    {
        firstCalledCounter++;
        Console.WriteLine("First called " + firstCalledCounter);
        IList<TSource> list = source as IList<TSource>;
        if (list != null)
        {
            if (list.Count > 0)
            {
                return list[0];
            }
            else return default(TSource);
        }
        else return default(TSource);
    }
}

This reveals that in the "one liner" a.First() is called 236376 times, while in the version where we factor out a.First() it is called 98 times. This explains the performance difference.

回复收藏 0 原文

冷月断魂刀 2024-11-11 02:51:42

一般来说，这两个示例的主要问题是（请注意，这并不能回答问题，因为更改此操作会对这两个示例产生影响）：

if (a.Count() <= 1) return a;

在这里，您只是检查是否有一个项目，但是，您正在调用 Count。如果 IEnumerable 实现确实实现了 ICollection，则每次都会枚举列表。

由于您递归地调用此函数，因此当您不这样做时，您最终会迭代序列中的所有项目真的不必。

相反，您应该这样做：

if (!a.Skip(1).Any()) return a;

这样，您最多只迭代两个项目（跳过第一个，检查第二个是否存在）来确定序列中是否最多有一个项目。

请注意，对于较大的 N 值，您将遇到与 Count 相同的问题；这是这里的一个优化，因为N的值非常小。

A major offender in general for both examples is this (note this does not answer the question, as changing this operation would have an impact on both examples):

if (a.Count() <= 1) return a;

Here, you are simply checking to see if you have one item, and yet, you are calling Count. In the event that the IEnumerable<T> implementation does implement ICollection<T>, it will enumerate through the list every time.

Since you are calling this recursively, you end up iterating through all of the items in the sequence when you don't really have to.

Rather, you should do this:

if (!a.Skip(1).Any()) return a;

This way, you only iterate through two items at most (skipping the first, checking the existence of a second) to determine whether or not you have at most one item in the sequence.

Note that for large values of N, you will run into the same issues as you do with Count; it is an optimization here because the value of N is so small.

回复收藏 0 原文

~没有更多了~