为什么WhereSelectArrayIterator不实现ICollection?
在通过 Reflector 查看 System.Linq.Enumerable 时,我注意到用于 Select 和 Where 扩展方法的默认迭代器 - WhereSelectArrayIterator - 不实现ICollection接口。如果我正确地阅读代码,这会导致其他一些扩展方法,例如 Count() 和 ToList() 执行速度变慢:
public static IEnumerable<TResult> Select<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
{
// code above snipped
if (source is List<TSource>)
{
return new WhereSelectListIterator<TSource, TResult>((List<TSource>) source, null, selector);
}
// code below snipped
}
private class WhereSelectListIterator<TSource, TResult> : Enumerable.Iterator<TResult>
{
// Fields
private List<TSource> source; // class has access to List source so can implement ICollection
// code below snipped
}
public class List<T> : IList<T>, ICollection<T>, IEnumerable<T>, IList, ICollection, IEnumerable
{
public List(IEnumerable<T> collection)
{
ICollection<T> is2 = collection as ICollection<T>;
if (is2 != null)
{
int count = is2.Count;
this._items = new T[count];
is2.CopyTo(this._items, 0); // FAST
this._size = count;
}
else
{
this._size = 0;
this._items = new T[4];
using (IEnumerator<T> enumerator = collection.GetEnumerator())
{
while (enumerator.MoveNext())
{
this.Add(enumerator.Current); // SLOW, CAUSES ARRAY EXPANSION
}
}
}
}
}
我对此进行了测试,结果证实了我的怀疑:
ICollection:2388.5222 ms
IEnumerable:3308.3382 ms
这是测试代码:
// prepare source
var n = 10000;
var source = new List<int>(n);
for (int i = 0; i < n; i++) source.Add(i);
// Test List creation using ICollection
var startTime = DateTime.Now;
for (int i = 0; i < n; i++)
{
foreach(int l in source.Select(k => k)); // itterate to make comparison fair
new List<int>(source);
}
var finishTime = DateTime.Now;
Response.Write("ICollection: " + (finishTime - startTime).TotalMilliseconds + " ms <br />");
// Test List creation using IEnumerable
startTime = DateTime.Now;
for (int i = 0; i < n; i++) new List<int>(source.Select(k => k));
finishTime = DateTime.Now;
Response.Write("IEnumerable: " + (finishTime - startTime).TotalMilliseconds + " ms");
我是否遗漏了某些内容,或者会在未来版本的框架中修复此问题吗?
谢谢你的想法。
In looking at System.Linq.Enumerable through Reflector i noticed that default iterator used for Select and Where extension methods - WhereSelectArrayIterator - does not implement ICollection interface. If i read code properly this causes some other extension methods, such as Count() and ToList() perform slower:
public static IEnumerable<TResult> Select<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
{
// code above snipped
if (source is List<TSource>)
{
return new WhereSelectListIterator<TSource, TResult>((List<TSource>) source, null, selector);
}
// code below snipped
}
private class WhereSelectListIterator<TSource, TResult> : Enumerable.Iterator<TResult>
{
// Fields
private List<TSource> source; // class has access to List source so can implement ICollection
// code below snipped
}
public class List<T> : IList<T>, ICollection<T>, IEnumerable<T>, IList, ICollection, IEnumerable
{
public List(IEnumerable<T> collection)
{
ICollection<T> is2 = collection as ICollection<T>;
if (is2 != null)
{
int count = is2.Count;
this._items = new T[count];
is2.CopyTo(this._items, 0); // FAST
this._size = count;
}
else
{
this._size = 0;
this._items = new T[4];
using (IEnumerator<T> enumerator = collection.GetEnumerator())
{
while (enumerator.MoveNext())
{
this.Add(enumerator.Current); // SLOW, CAUSES ARRAY EXPANSION
}
}
}
}
}
I've tested this with results confirming my suspicion:
ICollection: 2388.5222 ms
IEnumerable: 3308.3382 ms
Here's the test code:
// prepare source
var n = 10000;
var source = new List<int>(n);
for (int i = 0; i < n; i++) source.Add(i);
// Test List creation using ICollection
var startTime = DateTime.Now;
for (int i = 0; i < n; i++)
{
foreach(int l in source.Select(k => k)); // itterate to make comparison fair
new List<int>(source);
}
var finishTime = DateTime.Now;
Response.Write("ICollection: " + (finishTime - startTime).TotalMilliseconds + " ms <br />");
// Test List creation using IEnumerable
startTime = DateTime.Now;
for (int i = 0; i < n; i++) new List<int>(source.Select(k => k));
finishTime = DateTime.Now;
Response.Write("IEnumerable: " + (finishTime - startTime).TotalMilliseconds + " ms");
Am i missing something or will this be fixed in future versions of framework?
Thank you for your thoughts.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
LINQ to Objects 使用一些技巧来优化某些操作。例如,如果将两个
.Where
语句链接在一起,谓词将组合成一个WhereArrayIterator
,因此前面的谓词可以被垃圾收集。同样,Where
后跟Select
将创建一个WhereSelectArrayIterator
,将组合谓词作为参数传递,以便原始WhereArrayiterator< /code> 可以被垃圾收集。因此,
WhereSelectArrayIterator
不仅负责跟踪选择器
,还负责跟踪它可能基于或不基于的组合谓词
。source
字段仅跟踪给定的初始列表。由于谓词的原因,迭代结果并不总是具有与source
相同的项目数。由于 LINQ 旨在进行延迟评估,因此它不应该提前根据predicate
评估source
,以便在有人最终调用时可以节省时间.Count()
。如果用户通过多个Where
和Select.ToList()
一样多的性能损失code> 子句,您最终会不必要地构造多个列表。是否可以重构 LINQ to Objects 以创建一个在数组上直接调用
Select
时使用的SelectArrayIterator
?当然。它会提高性能吗?一点点。费用是多少?更少的代码重用意味着需要维护和测试更多的代码。因此,我们找到了绝大多数“为什么语言/平台 X 没有功能 Y”问题的症结所在:每个功能和优化都有一些与之相关的成本,甚至微软也没有无限的资源。就像其他所有公司一样,他们进行判断调用来确定在数组上执行
Select
然后对其调用.ToList()
的代码的运行频率,以及是否值得在 LINQ 包中编写和维护另一个类,使其运行得更快一些。LINQ to Objects uses some tricks to optimize certain operations. For example, if you chain two
.Where
statements together, the predicates will be combined into a singleWhereArrayIterator
, so the previous ones can be garbage collected. Likewise, aWhere
followed by aSelect
will create aWhereSelectArrayIterator
, passing the combined predicates as an argument so that the originalWhereArrayiterator
can be garbage collected. So theWhereSelectArrayIterator
is responsible for tracking not only theselector
, but also the combinedpredicate
that it may or may not be based on.The
source
field only keeps track of the initial list that was given. Because of the predicate, the iteration result will not always have the same number of items assource
does. Since LINQ is intended to be lazily-evaluated, it shouldn't evaluate thesource
against thepredicate
ahead of time just so that it can potentially save time if someone ends up calling.Count()
. That would cause just as much of a performance hit as calling.ToList()
on it manually, and if the user ran it through multipleWhere
andSelect
clauses, you'd end up constructing multiple lists unnecessarily.Could LINQ to Objects be refactored to create a
SelectArrayIterator
that it uses whenSelect
gets called directly on an array? Sure. Would it enhance performance? A little bit. At what cost? Less code reuse means additional code to maintain and test moving forward.And thus we get to the crux of the vast majority of "Why doesn't language/platform X have feature Y" questions: every feature and optimization has some cost associated with it, and even Microsoft doesn't have unlimited resources. Just like every other company out there, they make judgment calls to determine how often code will be run that performs a
Select
on an array and then calls.ToList()
on it, and whether making that run a little faster is worth writing and maintaining another class in the LINQ package.