对 LINQ 方法的运行时复杂性 (Big-O) 有哪些保证?

发布于 2024-09-01 01:46:45 字数 1158 浏览 5 评论 0原文

我最近开始大量使用 LINQ,而且我还没有真正看到任何有关任何 LINQ 方法的运行时复杂性的提及。显然,这里有很多因素在起作用,因此我们将讨论限制在普通的 IEnumerable LINQ-to-Objects 提供程序上。此外,我们假设任何作为选择器/变异器等传入的 Func 都是一个廉价的 O(1) 操作。

显然,所有单遍操作(SelectWhereCountTake/Skip、< code>Any/All 等)将是 O(n),因为它们只需要遍历序列一次;尽管这也受到了懒惰的影响。

对于更复杂的操作,事情变得更加模糊;类似集合的运算符(UnionDistinctExcept 等)默认使用 GetHashCode(据我所知) ),因此假设他们在内部使用哈希表似乎是合理的,一般来说,这些操作也为 O(n) 。使用 IEqualityComparer 的版本怎么样?

OrderBy 需要排序,因此我们很可能正在考虑 O(n log n)。如果已经排序了怎么办?如果我说 OrderBy().ThenBy() 并为两者提供相同的密钥怎么样?

我可以看到使用排序或散列的 GroupBy (和 Join)。是哪一个?

ContainsList 上的复杂度为 O(n),但在 HashSet 上的复杂度为 O(1) - LINQ 是否检查底层容器以查看是否它可以加快速度吗?

真正的问题是——到目前为止,我一直相信这些操作是高效的。但是,我可以指望这一点吗?例如,STL 容器明确指定了每个操作的复杂性。 .NET 库规范中是否对 LINQ 性能有任何类似的保证?

更多问题(回应评论):
没有真正考虑过开销,但我没想到简单的 Linq-to-Objects 会有很多开销。 CodingHorror 帖子正在谈论 Linq-to-SQL,我可以理解解析查询并生成 SQL 会增加成本 - 对象提供程序是否也有类似的成本?如果是这样,如果您使用声明性或函数式语法,会有不同吗?

I've recently started using LINQ quite a bit, and I haven't really seen any mention of run-time complexity for any of the LINQ methods. Obviously, there are many factors at play here, so let's restrict the discussion to the plain IEnumerable LINQ-to-Objects provider. Further, let's assume that any Func passed in as a selector / mutator / etc. is a cheap O(1) operation.

It seems obvious that all the single-pass operations (Select, Where, Count, Take/Skip, Any/All, etc.) will be O(n), since they only need to walk the sequence once; although even this is subject to laziness.

Things are murkier for the more complex operations; the set-like operators (Union, Distinct, Except, etc.) work using GetHashCode by default (afaik), so it seems reasonable to assume they're using a hash-table internally, making these operations O(n) as well, in general. What about the versions that use an IEqualityComparer?

OrderBy would need a sort, so most likely we're looking at O(n log n). What if it's already sorted? How about if I say OrderBy().ThenBy() and provide the same key to both?

I could see GroupBy (and Join) using either sorting, or hashing. Which is it?

Contains would be O(n) on a List, but O(1) on a HashSet - does LINQ check the underlying container to see if it can speed things up?

And the real question - so far, I've been taking it on faith that the operations are performant. However, can I bank on that? STL containers, for example, clearly specify the complexity of every operation. Are there any similar guarantees on LINQ performance in the .NET library specification?

More question (in response to comments):
Hadn't really thought about overhead, but I didn't expect there to be very much for simple Linq-to-Objects. The CodingHorror post is talking about Linq-to-SQL, where I can understand parsing the query and making SQL would add cost - is there a similar cost for the Objects provider too? If so, is it different if you're using the declarative or functional syntax?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

一腔孤↑勇 2024-09-08 01:46:45

保证非常非常少,但有一些优化:

  • 使用索引访问的扩展方法,例如 ElementAtSkipLast< /code> 或 LastOrDefault,将检查基础类型是否实现 IList,以便您获得 O(1) 访问而不是 O(N )。

  • Count 方法检查 ICollection 实现,以便此操作的复杂度为 O(1) 而不是 O(N)。

  • DistinctGroupBy Join,我相信还有集合聚合方法(Union>IntersectExcept)使用散列,因此它们应该接近 O(N) 而不是 O(N²)。

  • Contains 检查 ICollection 实现,因此如果底层集合也是 O(1),那么它可能是 O(1),例如 HashSet,但这取决于实际的数据结构,并且不能保证。哈希集重写 Contains 方法,这就是它们的复杂度为 O(1) 的原因。

  • OrderBy 方法使用稳定的快速排序,因此它们的平均情况为 O(N log N)。

我认为这涵盖了大多数(如果不是全部)内置扩展方法。性能保证确实很少; Linq 本身会尝试利用高效的数据结构,但这并不是编写可能低效的代码的免费通行证。

There are very, very few guarantees, but there are a few optimizations:

  • Extension methods that use indexed access, such as ElementAt, Skip, Last or LastOrDefault, will check to see whether or not the underlying type implements IList<T>, so that you get O(1) access instead of O(N).

  • The Count method checks for an ICollection implementation, so that this operation is O(1) instead of O(N).

  • Distinct, GroupBy Join, and I believe also the set-aggregation methods (Union, Intersect and Except) use hashing, so they should be close to O(N) instead of O(N²).

  • Contains checks for an ICollection implementation, so it may be O(1) if the underlying collection is also O(1), such as a HashSet<T>, but this is depends on the actual data structure and is not guaranteed. Hash sets override the Contains method, that's why they are O(1).

  • OrderBy methods use a stable quicksort, so they're O(N log N) average case.

I think that covers most if not all of the built-in extension methods. There really are very few performance guarantees; Linq itself will try to take advantage of efficient data structures but it isn't a free pass to write potentially inefficient code.

我的奇迹 2024-09-08 01:46:45

我很早就知道,如果枚举是 IList,则 .Count() 返回 .Count

但我总是对 Set 操作的运行时复杂性感到有点厌倦:.Intersect().Except().Union()< /代码>。

这是 .Intersect() 的反编译 BCL (.NET 4.0/4.5) 实现(我的评论):

private static IEnumerable<TSource> IntersectIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
  Set<TSource> set = new Set<TSource>(comparer);
  foreach (TSource source in second)                    // O(M)
    set.Add(source);                                    // O(1)

  foreach (TSource source in first)                     // O(N)
  {
    if (set.Remove(source))                             // O(1)
      yield return source;
  }
}

结论:

  • 性能为 O(M + N),但
  • 实现不 当集合已经集合时利用。 (这可能不一定很简单,因为使用的 IEqualityComparer 也需要匹配。)

为了完整起见,这里是 .Union()的实现>.Except()

剧透警告:它们也具有 O(N+M) 复杂性。

private static IEnumerable<TSource> UnionIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
  Set<TSource> set = new Set<TSource>(comparer);
  foreach (TSource source in first)
  {
    if (set.Add(source))
      yield return source;
  }
  foreach (TSource source in second)
  {
    if (set.Add(source))
      yield return source;
  }
}


private static IEnumerable<TSource> ExceptIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
  Set<TSource> set = new Set<TSource>(comparer);
  foreach (TSource source in second)
    set.Add(source);
  foreach (TSource source in first)
  {
    if (set.Add(source))
      yield return source;
  }
}

I've long known that .Count() returns .Count if the enumeration is an IList.

But I was always a bit weary about the run-time complexity of the Set operations: .Intersect(), .Except(), .Union().

Here's the decompiled BCL (.NET 4.0/4.5) implementation for .Intersect() (comments mine):

private static IEnumerable<TSource> IntersectIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
  Set<TSource> set = new Set<TSource>(comparer);
  foreach (TSource source in second)                    // O(M)
    set.Add(source);                                    // O(1)

  foreach (TSource source in first)                     // O(N)
  {
    if (set.Remove(source))                             // O(1)
      yield return source;
  }
}

Conclusions:

  • the performance is O(M + N)
  • the implementation doesn't take advantage when the collections already are sets. (It may not be necessarily straightforward, because the used IEqualityComparer<T> also needs to match.)

For completeness, here are the implementations for .Union() and .Except().

Spoiler alert: they, too, have O(N+M) complexity.

private static IEnumerable<TSource> UnionIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
  Set<TSource> set = new Set<TSource>(comparer);
  foreach (TSource source in first)
  {
    if (set.Add(source))
      yield return source;
  }
  foreach (TSource source in second)
  {
    if (set.Add(source))
      yield return source;
  }
}


private static IEnumerable<TSource> ExceptIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
  Set<TSource> set = new Set<TSource>(comparer);
  foreach (TSource source in second)
    set.Add(source);
  foreach (TSource source in first)
  {
    if (set.Add(source))
      yield return source;
  }
}
飘落散花 2024-09-08 01:46:45

您真正可以信赖的是,Enumerable 方法针对一般情况编写得很好,并且不会使用幼稚的算法。可能有第三方的东西(博客等)描述了实际使用的算法,但这些不是官方的,也不是 STL 算法那样的保证。

为了说明这一点,这里是来自 System.Core 的 Enumerable.Count 的反映源代码(由 ILSpy 提供):

// System.Linq.Enumerable
public static int Count<TSource>(this IEnumerable<TSource> source)
{
    checked
    {
        if (source == null)
        {
            throw Error.ArgumentNull("source");
        }
        ICollection<TSource> collection = source as ICollection<TSource>;
        if (collection != null)
        {
            return collection.Count;
        }
        ICollection collection2 = source as ICollection;
        if (collection2 != null)
        {
            return collection2.Count;
        }
        int num = 0;
        using (IEnumerator<TSource> enumerator = source.GetEnumerator())
        {
            while (enumerator.MoveNext())
            {
                num++;
            }
        }
        return num;
    }
}

如您所见,它付出了一些努力来避免简单枚举每个元素的天真的解决方案。

All you can really bank on is that the Enumerable methods are well-written for the general case and won't use naive algorithms. There is probably third-party stuff (blogs, etc.) that describe the algorithms actually in use, but these are not official or guaranteed in the sense that STL algorithms are.

To illustrate, here is the reflected source code (courtesy of ILSpy) for Enumerable.Count from System.Core:

// System.Linq.Enumerable
public static int Count<TSource>(this IEnumerable<TSource> source)
{
    checked
    {
        if (source == null)
        {
            throw Error.ArgumentNull("source");
        }
        ICollection<TSource> collection = source as ICollection<TSource>;
        if (collection != null)
        {
            return collection.Count;
        }
        ICollection collection2 = source as ICollection;
        if (collection2 != null)
        {
            return collection2.Count;
        }
        int num = 0;
        using (IEnumerator<TSource> enumerator = source.GetEnumerator())
        {
            while (enumerator.MoveNext())
            {
                num++;
            }
        }
        return num;
    }
}

As you can see, it goes to some effort to avoid the naive solution of simply enumerating every element.

护你周全 2024-09-08 01:46:45

我刚刚打破了反射器,它们在调用 Contains 时检查底层类型。

public static bool Contains<TSource>(this IEnumerable<TSource> source, TSource value)
{
    ICollection<TSource> is2 = source as ICollection<TSource>;
    if (is2 != null)
    {
        return is2.Contains(value);
    }
    return source.Contains<TSource>(value, null);
}

I just broke out reflector and they do check the underlying type when Contains is called.

public static bool Contains<TSource>(this IEnumerable<TSource> source, TSource value)
{
    ICollection<TSource> is2 = source as ICollection<TSource>;
    if (is2 != null)
    {
        return is2.Contains(value);
    }
    return source.Contains<TSource>(value, null);
}
请恋爱 2024-09-08 01:46:45

正确答案是“视情况而定”。这取决于底层 IEnumerable 的类型。我知道对于某些集合(例如实现 ICollection 或 IList 的集合),使用了特殊的代码路径,但是实际的实现并不能保证做任何特殊的事情。例如,我知道 ElementAt() 对于可索引集合有一个特殊情况,与 Count() 类似。但一般来说,您应该假设最坏情况下的 O(n) 性能。

一般来说,我认为您不会找到您想要的性能保证,但如果您确实遇到 linq 运算符的特定性能问题,您始终可以为您的特定集合重新实现它。此外,还有许多博客和可扩展性项目将 Linq to Objects 扩展以添加此类性能保证。查看 索引 LINQ,它扩展并添加到运算符集以获得更多性能优势。

The correct answer is "it depends". it depends on what type the underlying IEnumerable is. i know that for some collections (like collections that implement ICollection or IList) there are special codepaths that are used, However the actual implementation is not guaranteed to do anything special. for example i know that ElementAt() has a special case for indexable collections, similarly with Count(). But in general you should probably assume the worst case O(n) performance.

In generaly i don't think you are going to find the kind of performance guarantees you want, though if you do run into a particular performance problem with a linq operator you can always just reimplement it for your particular collection. Also there are many blogs and extensibility projects which extend Linq to Objects to add these kinds of performance guarantees. check out Indexed LINQ which extends and adds to the operator set for more performance benefits.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文