对 LINQ 方法的运行时复杂性 (Big-O) 有哪些保证？

发布于 2024-09-01 01:46:45 字数 1158 浏览 5 评论 0原文

我最近开始大量使用 LINQ，而且我还没有真正看到任何有关任何 LINQ 方法的运行时复杂性的提及。显然，这里有很多因素在起作用，因此我们将讨论限制在普通的 IEnumerable LINQ-to-Objects 提供程序上。此外，我们假设任何作为选择器/变异器等传入的 Func 都是一个廉价的 O(1) 操作。

显然，所有单遍操作（Select、Where、Count、Take/Skip、< code>Any/All 等）将是 O(n)，因为它们只需要遍历序列一次；尽管这也受到了懒惰的影响。

对于更复杂的操作，事情变得更加模糊；类似集合的运算符（Union、Distinct、Except 等）默认使用 GetHashCode（据我所知）），因此假设他们在内部使用哈希表似乎是合理的，一般来说，这些操作也为 O(n) 。使用 IEqualityComparer 的版本怎么样？

OrderBy 需要排序，因此我们很可能正在考虑 O(n log n)。如果已经排序了怎么办？如果我说 OrderBy().ThenBy() 并为两者提供相同的密钥怎么样？

我可以看到使用排序或散列的 GroupBy （和 Join）。是哪一个？

Contains 在 List 上的复杂度为 O(n)，但在 HashSet 上的复杂度为 O(1) - LINQ 是否检查底层容器以查看是否它可以加快速度吗？

真正的问题是——到目前为止，我一直相信这些操作是高效的。但是，我可以指望这一点吗？例如，STL 容器明确指定了每个操作的复杂性。 .NET 库规范中是否对 LINQ 性能有任何类似的保证？

更多问题（回应评论）：
没有真正考虑过开销，但我没想到简单的 Linq-to-Objects 会有很多开销。 CodingHorror 帖子正在谈论 Linq-to-SQL，我可以理解解析查询并生成 SQL 会增加成本 - 对象提供程序是否也有类似的成本？如果是这样，如果您使用声明性或函数式语法，会有不同吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一腔孤↑勇 2024-09-08 01:46:45

保证非常非常少，但有一些优化：

使用索引访问的扩展方法，例如 ElementAt、Skip、Last< /code> 或 LastOrDefault，将检查基础类型是否实现 IList，以便您获得 O(1) 访问而不是 O(N )。
Count 方法检查 ICollection 实现，以便此操作的复杂度为 O(1) 而不是 O(N)。
Distinct、GroupBy Join，我相信还有集合聚合方法（Union、>Intersect 和 Except）使用散列，因此它们应该接近 O(N) 而不是 O(N²)。
Contains 检查 ICollection 实现，因此如果底层集合也是 O(1)，那么它可能是 O(1)，例如 HashSet，但这取决于实际的数据结构，并且不能保证。哈希集重写 Contains 方法，这就是它们的复杂度为 O(1) 的原因。
OrderBy 方法使用稳定的快速排序，因此它们的平均情况为 O(N log N)。

我认为这涵盖了大多数（如果不是全部）内置扩展方法。性能保证确实很少； Linq 本身会尝试利用高效的数据结构，但这并不是编写可能低效的代码的免费通行证。

回复收藏 0 原文

我的奇迹 2024-09-08 01:46:45

我很早就知道，如果枚举是 IList，则 .Count() 返回 .Count。

但我总是对 Set 操作的运行时复杂性感到有点厌倦：.Intersect()、.Except()、.Union()< /代码>。

这是 .Intersect() 的反编译 BCL (.NET 4.0/4.5) 实现（我的评论）：

private static IEnumerable<TSource> IntersectIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
  Set<TSource> set = new Set<TSource>(comparer);
  foreach (TSource source in second)                    // O(M)
    set.Add(source);                                    // O(1)

  foreach (TSource source in first)                     // O(N)
  {
    if (set.Remove(source))                             // O(1)
      yield return source;
  }
}

结论：

性能为 O(M + N)，但
实现不当集合已经集合时利用。（这可能不一定很简单，因为使用的 IEqualityComparer 也需要匹配。）

为了完整起见，这里是 .Union() 和 的实现>.Except()。

剧透警告：它们也具有 O(N+M) 复杂性。

private static IEnumerable<TSource> UnionIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
  Set<TSource> set = new Set<TSource>(comparer);
  foreach (TSource source in first)
  {
    if (set.Add(source))
      yield return source;
  }
  foreach (TSource source in second)
  {
    if (set.Add(source))
      yield return source;
  }
}


private static IEnumerable<TSource> ExceptIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
  Set<TSource> set = new Set<TSource>(comparer);
  foreach (TSource source in second)
    set.Add(source);
  foreach (TSource source in first)
  {
    if (set.Add(source))
      yield return source;
  }
}

I've long known that .Count() returns .Count if the enumeration is an IList.

But I was always a bit weary about the run-time complexity of the Set operations: .Intersect(), .Except(), .Union().

Here's the decompiled BCL (.NET 4.0/4.5) implementation for .Intersect() (comments mine):

private static IEnumerable<TSource> IntersectIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
  Set<TSource> set = new Set<TSource>(comparer);
  foreach (TSource source in second)                    // O(M)
    set.Add(source);                                    // O(1)

  foreach (TSource source in first)                     // O(N)
  {
    if (set.Remove(source))                             // O(1)
      yield return source;
  }
}

Conclusions:

the performance is O(M + N)
the implementation doesn't take advantage when the collections already are sets. (It may not be necessarily straightforward, because the used IEqualityComparer<T> also needs to match.)

For completeness, here are the implementations for .Union() and .Except().

Spoiler alert: they, too, have O(N+M) complexity.

private static IEnumerable<TSource> UnionIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
  Set<TSource> set = new Set<TSource>(comparer);
  foreach (TSource source in first)
  {
    if (set.Add(source))
      yield return source;
  }
  foreach (TSource source in second)
  {
    if (set.Add(source))
      yield return source;
  }
}


private static IEnumerable<TSource> ExceptIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
  Set<TSource> set = new Set<TSource>(comparer);
  foreach (TSource source in second)
    set.Add(source);
  foreach (TSource source in first)
  {
    if (set.Add(source))
      yield return source;
  }
}

回复收藏 0 原文

飘落散花 2024-09-08 01:46:45

您真正可以信赖的是，Enumerable 方法针对一般情况编写得很好，并且不会使用幼稚的算法。可能有第三方的东西（博客等）描述了实际使用的算法，但这些不是官方的，也不是 STL 算法那样的保证。

为了说明这一点，这里是来自 System.Core 的 Enumerable.Count 的反映源代码（由 ILSpy 提供）：

// System.Linq.Enumerable
public static int Count<TSource>(this IEnumerable<TSource> source)
{
    checked
    {
        if (source == null)
        {
            throw Error.ArgumentNull("source");
        }
        ICollection<TSource> collection = source as ICollection<TSource>;
        if (collection != null)
        {
            return collection.Count;
        }
        ICollection collection2 = source as ICollection;
        if (collection2 != null)
        {
            return collection2.Count;
        }
        int num = 0;
        using (IEnumerator<TSource> enumerator = source.GetEnumerator())
        {
            while (enumerator.MoveNext())
            {
                num++;
            }
        }
        return num;
    }
}

如您所见，它付出了一些努力来避免简单枚举每个元素的天真的解决方案。

All you can really bank on is that the Enumerable methods are well-written for the general case and won't use naive algorithms. There is probably third-party stuff (blogs, etc.) that describe the algorithms actually in use, but these are not official or guaranteed in the sense that STL algorithms are.

To illustrate, here is the reflected source code (courtesy of ILSpy) for Enumerable.Count from System.Core:

// System.Linq.Enumerable
public static int Count<TSource>(this IEnumerable<TSource> source)
{
    checked
    {
        if (source == null)
        {
            throw Error.ArgumentNull("source");
        }
        ICollection<TSource> collection = source as ICollection<TSource>;
        if (collection != null)
        {
            return collection.Count;
        }
        ICollection collection2 = source as ICollection;
        if (collection2 != null)
        {
            return collection2.Count;
        }
        int num = 0;
        using (IEnumerator<TSource> enumerator = source.GetEnumerator())
        {
            while (enumerator.MoveNext())
            {
                num++;
            }
        }
        return num;
    }
}

As you can see, it goes to some effort to avoid the naive solution of simply enumerating every element.

回复收藏 0 原文

护你周全 2024-09-08 01:46:45

我刚刚打破了反射器，它们在调用 Contains 时检查底层类型。

public static bool Contains<TSource>(this IEnumerable<TSource> source, TSource value)
{
    ICollection<TSource> is2 = source as ICollection<TSource>;
    if (is2 != null)
    {
        return is2.Contains(value);
    }
    return source.Contains<TSource>(value, null);
}

I just broke out reflector and they do check the underlying type when Contains is called.

public static bool Contains<TSource>(this IEnumerable<TSource> source, TSource value)
{
    ICollection<TSource> is2 = source as ICollection<TSource>;
    if (is2 != null)
    {
        return is2.Contains(value);
    }
    return source.Contains<TSource>(value, null);
}

回复收藏 0 原文

请恋爱 2024-09-08 01:46:45

正确答案是“视情况而定”。这取决于底层 IEnumerable 的类型。我知道对于某些集合（例如实现 ICollection 或 IList 的集合），使用了特殊的代码路径，但是实际的实现并不能保证做任何特殊的事情。例如，我知道 ElementAt() 对于可索引集合有一个特殊情况，与 Count() 类似。但一般来说，您应该假设最坏情况下的 O(n) 性能。

一般来说，我认为您不会找到您想要的性能保证，但如果您确实遇到 linq 运算符的特定性能问题，您始终可以为您的特定集合重新实现它。此外，还有许多博客和可扩展性项目将 Linq to Objects 扩展以添加此类性能保证。查看索引 LINQ，它扩展并添加到运算符集以获得更多性能优势。

回复收藏 0 原文

~没有更多了~