LINQ 查询性能与紧凑查询
在检查 LINQ 的功能时,我编写了简单的 QuickSort 实现和
很高兴最终快速排序功能适合一行。
然而我注意到这个“一行”函数的性能与我原来的“直接”版本有很大不同。
下面是循环调用快速排序函数 10 次的代码:
var r = new Random(DateTime.Now.Millisecond);
Stopwatch watch = new Stopwatch();
for (int i = 0; i < 10; i++)
{
watch.Reset();
var randomA = new int[100].Select(x => r.Next(100)).ToList();
watch.Start();
var sorted = QuickSort<int>(randomA);
watch.Stop();
Console.WriteLine("Duration: {0} ms", watch.ElapsedMilliseconds);
}
Console.ReadLine();
以及输出结果的 QuickSort 函数的两个实现:
简单版本:
IEnumerable<T> QuickSort<T>(IEnumerable<T> a) where T : IComparable<T>
{
if (a.Count() <= 1) return a;
var pivot = a.First();
IEnumerable<T> lesser = a.Skip(1).Where(x => x.CompareTo(pivot) < 0);
IEnumerable<T> bigger = a.Skip(1).Where(x => x.CompareTo(pivot) >= 0);
return QuickSort(lesser).Concat(new T[] { pivot }).Concat(QuickSort(bigger));
}
输出
持续时间:22 毫秒
持续时间:3 毫秒
持续时间:3 毫秒
持续时间:3 毫秒
持续时间:3 毫秒
持续时间:3 毫秒
持续时间:2 毫秒
持续时间:2 毫秒
持续时间:3 毫秒
持续时间:3 ms
单行实现:
IEnumerable<T> QuickSort<T>(IEnumerable<T> a) where T : IComparable<T>
{
return a.Count() <= 1 ? a :
QuickSort(a.Skip(1).Where(i => i.CompareTo(a.First()) < 0)).
Concat(new T[] { a.First() }).
Concat(QuickSort(a.Skip(1).Where(i => i.CompareTo(a.First()) >= 0)));
}
输出
持续时间:24154 毫秒
持续时间:407 毫秒
持续时间:2281 毫秒
持续时间:2420 毫秒
持续时间:919 毫秒
持续时间:48777 毫秒
持续时间:4615 毫秒
持续时间:3115 毫秒
持续时间:1311 毫秒
持续时间:1631 ms
为什么性能差异如此之大?
While checking LINQ's abilities I've wrote simple QuickSort implementation and
was glad that ultimately quick sort function fits in one line.
However I've noticed that performance of this "one line" function significantly differ from my original "straight-forward" version.
Here is the code that calls quick sort function in a loop 10 times:
var r = new Random(DateTime.Now.Millisecond);
Stopwatch watch = new Stopwatch();
for (int i = 0; i < 10; i++)
{
watch.Reset();
var randomA = new int[100].Select(x => r.Next(100)).ToList();
watch.Start();
var sorted = QuickSort<int>(randomA);
watch.Stop();
Console.WriteLine("Duration: {0} ms", watch.ElapsedMilliseconds);
}
Console.ReadLine();
And two implementations of QuickSort function with Output results:
Simple version:
IEnumerable<T> QuickSort<T>(IEnumerable<T> a) where T : IComparable<T>
{
if (a.Count() <= 1) return a;
var pivot = a.First();
IEnumerable<T> lesser = a.Skip(1).Where(x => x.CompareTo(pivot) < 0);
IEnumerable<T> bigger = a.Skip(1).Where(x => x.CompareTo(pivot) >= 0);
return QuickSort(lesser).Concat(new T[] { pivot }).Concat(QuickSort(bigger));
}
Output
Duration: 22 ms
Duration: 3 ms
Duration: 3 ms
Duration: 3 ms
Duration: 3 ms
Duration: 3 ms
Duration: 2 ms
Duration: 2 ms
Duration: 3 ms
Duration: 3 ms
One Line implementation:
IEnumerable<T> QuickSort<T>(IEnumerable<T> a) where T : IComparable<T>
{
return a.Count() <= 1 ? a :
QuickSort(a.Skip(1).Where(i => i.CompareTo(a.First()) < 0)).
Concat(new T[] { a.First() }).
Concat(QuickSort(a.Skip(1).Where(i => i.CompareTo(a.First()) >= 0)));
}
Output
Duration: 24154 ms
Duration: 407 ms
Duration: 2281 ms
Duration: 2420 ms
Duration: 919 ms
Duration: 48777 ms
Duration: 4615 ms
Duration: 3115 ms
Duration: 1311 ms
Duration: 1631 ms
Why such a big difference in the performance?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
原因是您每次都重新计算
a.First()
- 如果您像这样再次将其分解:那么性能是相同的。
为了通过实验来澄清 - 使用拦截器和
First()
的(简单的,只是为了这个实验,不要在家里使用它,这是错误的)实现,我们可以看到实际上有多少次调用First()
。这表明,在“单行”中
a.First()
被调用 236376 次,而在我们分解a.First()< 的版本中/code> 它被调用98次。这解释了性能差异。
The reason is that you are re-calculating
a.First()
everytime - if you factor this out again like this:Then the performance is the same.
To clarify with an experiment - using an interceptor with a (trivialized, just for this experiment, don't use this at home, it's wrong) implementation of
First()
we can see how many times in factFirst()
is called.This reveals that in the "one liner"
a.First()
is called 236376 times, while in the version where we factor outa.First()
it is called 98 times. This explains the performance difference.一般来说,这两个示例的主要问题是(请注意,这并不能回答问题,因为更改此操作会对这两个示例产生影响):
在这里,您只是检查是否有一个项目,但是,您正在调用
Count
。如果IEnumerable
实现确实实现了ICollection
,则每次都会枚举列表。由于您递归地调用此函数,因此当 您不这样做时,您最终会迭代序列中的所有项目真的不必。
相反,您应该这样做:
这样,您最多只迭代两个项目(跳过第一个,检查第二个是否存在)来确定序列中是否最多有一个项目。
请注意,对于较大的
N
值,您将遇到与Count
相同的问题;这是这里的一个优化,因为N
的值非常小。A major offender in general for both examples is this (note this does not answer the question, as changing this operation would have an impact on both examples):
Here, you are simply checking to see if you have one item, and yet, you are calling
Count
. In the event that theIEnumerable<T>
implementation does implementICollection<T>
, it will enumerate through the list every time.Since you are calling this recursively, you end up iterating through all of the items in the sequence when you don't really have to.
Rather, you should do this:
This way, you only iterate through two items at most (skipping the first, checking the existence of a second) to determine whether or not you have at most one item in the sequence.
Note that for large values of
N
, you will run into the same issues as you do withCount
; it is an optimization here because the value ofN
is so small.