为什么WhereSelectArrayIterator不实现ICollection?

发布于 2024-11-30 04:03:57 字数 2807 浏览 3 评论 0原文

在通过 Reflector 查看 System.Linq.Enumerable 时,我注意到用于 SelectWhere 扩展方法的默认迭代器 - WhereSelectArrayIterator - 不实现ICollection接口。如果我正确地阅读代码,这会导致其他一些扩展方法,例如 Count()ToList() 执行速度变慢:

public static IEnumerable<TResult> Select<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
{
    // code above snipped
    if (source is List<TSource>)
    {
        return new WhereSelectListIterator<TSource, TResult>((List<TSource>) source, null, selector);
    }
    // code below snipped
}

private class WhereSelectListIterator<TSource, TResult> : Enumerable.Iterator<TResult>
{
    // Fields
    private List<TSource> source; // class has access to List source so can implement ICollection
    // code below snipped
}


public class List<T> : IList<T>, ICollection<T>, IEnumerable<T>, IList, ICollection, IEnumerable
{
public List(IEnumerable<T> collection)
{
    ICollection<T> is2 = collection as ICollection<T>;
    if (is2 != null)
    {
        int count = is2.Count;
        this._items = new T[count];
        is2.CopyTo(this._items, 0); // FAST
        this._size = count;
    }
    else
    {
        this._size = 0;
        this._items = new T[4];
        using (IEnumerator<T> enumerator = collection.GetEnumerator())
        {
            while (enumerator.MoveNext())
            {
                this.Add(enumerator.Current);  // SLOW, CAUSES ARRAY EXPANSION
            }
        }
    }
}

}

我对此进行了测试,结果证实了我的怀疑:

ICollection:2388.5222 ms

IEnumerable:3308.3382 ms

这是测试代码:

    // prepare source
    var n = 10000;
    var source = new List<int>(n);
    for (int i = 0; i < n; i++) source.Add(i);

    // Test List creation using ICollection
    var startTime = DateTime.Now;
    for (int i = 0; i < n; i++)
    {
        foreach(int l in source.Select(k => k)); // itterate to make comparison fair
        new List<int>(source);
    }
    var finishTime = DateTime.Now;
    Response.Write("ICollection: " + (finishTime - startTime).TotalMilliseconds + " ms <br />");

    // Test List creation using IEnumerable
    startTime = DateTime.Now;
    for (int i = 0; i < n; i++) new List<int>(source.Select(k => k));
    finishTime = DateTime.Now;
    Response.Write("IEnumerable: " + (finishTime - startTime).TotalMilliseconds + " ms");

我是否遗漏了某些内容,或者会在未来版本的框架中修复此问题吗?

谢谢你的想法。

In looking at System.Linq.Enumerable through Reflector i noticed that default iterator used for Select and Where extension methods - WhereSelectArrayIterator - does not implement ICollection interface. If i read code properly this causes some other extension methods, such as Count() and ToList() perform slower:

public static IEnumerable<TResult> Select<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
{
    // code above snipped
    if (source is List<TSource>)
    {
        return new WhereSelectListIterator<TSource, TResult>((List<TSource>) source, null, selector);
    }
    // code below snipped
}

private class WhereSelectListIterator<TSource, TResult> : Enumerable.Iterator<TResult>
{
    // Fields
    private List<TSource> source; // class has access to List source so can implement ICollection
    // code below snipped
}


public class List<T> : IList<T>, ICollection<T>, IEnumerable<T>, IList, ICollection, IEnumerable
{
public List(IEnumerable<T> collection)
{
    ICollection<T> is2 = collection as ICollection<T>;
    if (is2 != null)
    {
        int count = is2.Count;
        this._items = new T[count];
        is2.CopyTo(this._items, 0); // FAST
        this._size = count;
    }
    else
    {
        this._size = 0;
        this._items = new T[4];
        using (IEnumerator<T> enumerator = collection.GetEnumerator())
        {
            while (enumerator.MoveNext())
            {
                this.Add(enumerator.Current);  // SLOW, CAUSES ARRAY EXPANSION
            }
        }
    }
}

}

I've tested this with results confirming my suspicion:

ICollection: 2388.5222 ms

IEnumerable: 3308.3382 ms

Here's the test code:

    // prepare source
    var n = 10000;
    var source = new List<int>(n);
    for (int i = 0; i < n; i++) source.Add(i);

    // Test List creation using ICollection
    var startTime = DateTime.Now;
    for (int i = 0; i < n; i++)
    {
        foreach(int l in source.Select(k => k)); // itterate to make comparison fair
        new List<int>(source);
    }
    var finishTime = DateTime.Now;
    Response.Write("ICollection: " + (finishTime - startTime).TotalMilliseconds + " ms <br />");

    // Test List creation using IEnumerable
    startTime = DateTime.Now;
    for (int i = 0; i < n; i++) new List<int>(source.Select(k => k));
    finishTime = DateTime.Now;
    Response.Write("IEnumerable: " + (finishTime - startTime).TotalMilliseconds + " ms");

Am i missing something or will this be fixed in future versions of framework?

Thank you for your thoughts.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

梦里兽 2024-12-07 04:03:57

LINQ to Objects 使用一些技巧来优化某些操作。例如,如果将两个 .Where 语句链接在一起,谓词将组合成一个 WhereArrayIterator,因此前面的谓词可以被垃圾收集。同样,Where 后跟 Select 将创建一个 WhereSelectArrayIterator,将组合谓词作为参数传递,以便原始 WhereArrayiterator< /code> 可以被垃圾收集。因此,WhereSelectArrayIterator 不仅负责跟踪选择器,还负责跟踪它可能基于或不基于的组合谓词

source 字段仅跟踪给定的初始列表。由于谓词的原因,迭代结果并不总是具有与 source 相同的项目数。由于 LINQ 旨在进行延迟评估,因此它不应该提前根据 predicate 评估 source ,以便在有人最终调用时可以节省时间.Count()。如果用户通过多个 WhereSelect.ToList() 一样多的性能损失code> 子句,您最终会不必要地构造多个列表。

是否可以重构 LINQ to Objects 以创建一个在数组上直接调用 Select 时使用的 SelectArrayIterator?当然。它会提高性能吗?一点点。费用是多少?更少的代码重用意味着需要维护和测试更多的代码。

因此,我们找到了绝大多数“为什么语言/平台 X 没有功能 Y”问题的症结所在:每个功能和优化都有一些与之相关的成本,甚至微软也没有无限的资源。就像其他所有公司一样,他们进行判断调用来确定在数组上执行 Select 然后对其调用 .ToList() 的代码的运行频率,以及是否值得在 LINQ 包中编写和维护另一个类,使其运行得更快一些。

LINQ to Objects uses some tricks to optimize certain operations. For example, if you chain two .Where statements together, the predicates will be combined into a single WhereArrayIterator, so the previous ones can be garbage collected. Likewise, a Where followed by a Select will create a WhereSelectArrayIterator, passing the combined predicates as an argument so that the original WhereArrayiterator can be garbage collected. So the WhereSelectArrayIterator is responsible for tracking not only the selector, but also the combined predicate that it may or may not be based on.

The source field only keeps track of the initial list that was given. Because of the predicate, the iteration result will not always have the same number of items as source does. Since LINQ is intended to be lazily-evaluated, it shouldn't evaluate the source against the predicate ahead of time just so that it can potentially save time if someone ends up calling .Count(). That would cause just as much of a performance hit as calling .ToList() on it manually, and if the user ran it through multiple Where and Select clauses, you'd end up constructing multiple lists unnecessarily.

Could LINQ to Objects be refactored to create a SelectArrayIterator that it uses when Select gets called directly on an array? Sure. Would it enhance performance? A little bit. At what cost? Less code reuse means additional code to maintain and test moving forward.

And thus we get to the crux of the vast majority of "Why doesn't language/platform X have feature Y" questions: every feature and optimization has some cost associated with it, and even Microsoft doesn't have unlimited resources. Just like every other company out there, they make judgment calls to determine how often code will be run that performs a Select on an array and then calls .ToList() on it, and whether making that run a little faster is worth writing and maintaining another class in the LINQ package.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文