递归泛型类型的实例化速度随着嵌套的深度呈指数级下降。为什么?

发布于 2024-11-29 23:22:10 字数 5400 浏览 1 评论 0 原文

注意:我可能在标题中选择了错误的单词;也许我在这里真正谈论的是多项式增长。请参阅本问题末尾的基准测试结果。

让我们从这三个代表不可变堆栈的递归通用接口开始:

interface IStack<T>
{
    INonEmptyStack<T, IStack<T>> Push(T x);
}

interface IEmptyStack<T> : IStack<T>
{
    new INonEmptyStack<T, IEmptyStack<T>> Push(T x);
}

interface INonEmptyStack<T, out TStackBeneath> : IStack<T>
    where TStackBeneath : IStack<T>
{
    T Top { get; }
    TStackBeneath Pop();
    new INonEmptyStack<T, INonEmptyStack<T, TStackBeneath>> Push(T x);
}

我创建了简单的实现<代码>EmptyStackNonEmptyStack

更新#1:请参阅下面的代码。

我注意到有关其运行时性能的以下事项:

  • 将 1,000 个项目推送到 EmptyStack 第一次需要7秒以上。
  • 之后将 1,000 个项目推送到 EmptyStack 上几乎不需要任何时间。
  • 推入堆栈的项目越多,性能就会呈指数级下降。

更新#2:

  • 我终于进行了更精确的测量。请参阅下面的基准代码和结果。

  • 我在这些测试中才发现,.NET 3.5 似乎不允许递归深度 ≥ 100 的泛型类型。.NET 4 似乎没有此限制。

前两个事实让我怀疑性能缓慢不是由于我的实现,而是由于类型系统:.NET 必须实例化 1,000 个不同的封闭泛型 >类型,即:

  • EmptyStack
  • NonEmptyStack>
  • NonEmptyStack>>
  • NonEmptyStack>>NonEmptyStack>>代码>

问题:

  1. 我的上述评估正确吗?
  2. 如果是这样,为什么诸如 TT>T 之类的泛型类型的实例化会失败? >>,依此类推,嵌套得越深,速度就越慢?
  3. .NET 之外的 CLR 实现(Mono、Silverlight、.NET Compact 等)是否具有相同的特征?

) 题外脚注:顺便说一句,这些类型非常有趣。因为它们允许编译器捕获某些错误,例如:

stack.Push(item).Pop().Pop();
// ^^^^^^
// 如果不知道“stack”非空,则会导致编译时错误。

或者您可以表达对某些堆栈操作的要求:

TStackBeneath PopTwoItems;
              (INonEmptyStack 堆栈)

更新#1:上述接口的实现

internal class EmptyStack<T> : IEmptyStack<T>
{
    public INonEmptyStack<T, IEmptyStack<T>> Push(T x)
    {
        return new NonEmptyStack<T, IEmptyStack<T>>(x, this);
    }

    INonEmptyStack<T, IStack<T>> IStack<T>.Push(T x)
    {
        return Push(x);
    }
}
// ^ this could be made into a singleton per type T

internal class NonEmptyStack<T, TStackBeneath> : INonEmptyStack<T, TStackBeneath>
    where TStackBeneath : IStack<T>
{
    private readonly T top;
    private readonly TStackBeneath stackBeneathTop;

    public NonEmptyStack(T top, TStackBeneath stackBeneathTop)
    {
        this.top = top;
        this.stackBeneathTop = stackBeneathTop;
    }

    public T Top { get { return top; } }

    public TStackBeneath Pop()
    {
        return stackBeneathTop;
    }

    public INonEmptyStack<T, INonEmptyStack<T, TStackBeneath>> Push(T x)
    {
        return new NonEmptyStack<T, INonEmptyStack<T, TStackBeneath>>(x, this);
    }

    INonEmptyStack<T, IStack<T>> IStack<T>.Push(T x)
    {
        return Push(x);
    }
}

更新#2:基准代码和结果

我使用以下代码在 Windows 7 SP 1 x64(Intel U4100 @ 1.3 GHz,4 GB RAM)笔记本上测量 .NET 4 的递归通用类型实例化时间。这是一台与我最初使用的机器不同、速度更快的机器,因此结果与上面的陈述不符。

Console.WriteLine("N, t [ms]");
int outerN = 0;
while (true)
{
    outerN++;
    var appDomain = AppDomain.CreateDomain(outerN.ToString());
    appDomain.SetData("n", outerN);
    appDomain.DoCallBack(delegate {
        int n = (int)AppDomain.CurrentDomain.GetData("n");
        var stopwatch = new Stopwatch();
        stopwatch.Start();
        IStack<int> s = new EmptyStack<int>();
        for (int i = 0; i < n; ++i)
        {
            s = s.Push(i);  // <-- this "creates" a new type
        }
        stopwatch.Stop();
        long ms = stopwatch.ElapsedMilliseconds;
        Console.WriteLine("{0}, {1}", n, ms);
    });
    AppDomain.Unload(appDomain);
}

(每次测量都是在单独的应用程序域中进行的,因为这可以确保必须在每次循环迭代中重新创建所有运行时类型。)

这是输出的 XY 图:

显示递归泛型类型实例化时间测量的折线图

  • 横轴:N 表示类型递归的深度,即:

    • N = 1 表示 NonEmptyStack>
    • N = 2 表示 NonEmptyStack>>
    • 等等
  • 纵轴:t 是推送 N< 所需的时间(以毫秒为单位) /em> 将整数放入堆栈。 (创建运行时类型所需的时间(如果实际发生的话)包含在此测量中。)

Note: I may have chosen the wrong word in the title; perhaps I'm really talking about polynomial growth here. See the benchmark result at the end of this question.

Let's start with these three recursive generic interfaces that represent immutable stacks:

interface IStack<T>
{
    INonEmptyStack<T, IStack<T>> Push(T x);
}

interface IEmptyStack<T> : IStack<T>
{
    new INonEmptyStack<T, IEmptyStack<T>> Push(T x);
}

interface INonEmptyStack<T, out TStackBeneath> : IStack<T>
    where TStackBeneath : IStack<T>
{
    T Top { get; }
    TStackBeneath Pop();
    new INonEmptyStack<T, INonEmptyStack<T, TStackBeneath>> Push(T x);
}

I've created straightforward implementations EmptyStack<T>, NonEmptyStack<T,TStackBeneath>.

Update #1: See the code below.

I've noticed the following things about their runtime performance:

  • Pushing 1,000 items onto an EmptyStack<int> for the first time takes more than 7 seconds.
  • Pushing 1,000 items onto an EmptyStack<int> takes virtually no time at all afterwards.
  • Performance gets exponentially worse the more items I push onto the stack.

Update #2:

  • I've finally performed a more precise measurement. See the benchmark code and results below.

  • I've only discovered during these tests that .NET 3.5 doesn't seem to allow generic types with a recursion depth ≥ 100. .NET 4 doesn't seem to have this restriction.

The first two facts make me suspect that the slow performance is not due to my implementation, but rather to the type system: .NET has to instantiate 1,000 distinct closed generic types, ie.:

  • EmptyStack<int>
  • NonEmptyStack<int, EmptyStack<int>>
  • NonEmptyStack<int, NonEmptyStack<int, EmptyStack<int>>>
  • NonEmptyStack<int, NonEmptyStack<int, NonEmptyStack<int, EmptyStack<int>>>>
  • etc.

Questions:

  1. Is my above assessment correct?
  2. If so, why does instantiation of generic types such as T<U>, T<T<U>>, T<T<T<U>>>, and so on get exponentially slower the deeper they are nested?
  3. Are CLR implementations other than .NET (Mono, Silverlight, .NET Compact etc.) known to exhibit the same characteristics?

) Off-topic footnote: These types are quite interesting btw. because they allow the compiler to catch certain errors such as:

stack.Push(item).Pop().Pop();
//                    ^^^^^^
// causes compile-time error if 'stack' is not known to be non-empty.

Or you can express requirements for certain stack operations:

TStackBeneath PopTwoItems<T, TStackBeneath>
              (INonEmptyStack<T, INonEmptyStack<T, TStackBeneath> stack)

Update #1: Implementation of the above interfaces

internal class EmptyStack<T> : IEmptyStack<T>
{
    public INonEmptyStack<T, IEmptyStack<T>> Push(T x)
    {
        return new NonEmptyStack<T, IEmptyStack<T>>(x, this);
    }

    INonEmptyStack<T, IStack<T>> IStack<T>.Push(T x)
    {
        return Push(x);
    }
}
// ^ this could be made into a singleton per type T

internal class NonEmptyStack<T, TStackBeneath> : INonEmptyStack<T, TStackBeneath>
    where TStackBeneath : IStack<T>
{
    private readonly T top;
    private readonly TStackBeneath stackBeneathTop;

    public NonEmptyStack(T top, TStackBeneath stackBeneathTop)
    {
        this.top = top;
        this.stackBeneathTop = stackBeneathTop;
    }

    public T Top { get { return top; } }

    public TStackBeneath Pop()
    {
        return stackBeneathTop;
    }

    public INonEmptyStack<T, INonEmptyStack<T, TStackBeneath>> Push(T x)
    {
        return new NonEmptyStack<T, INonEmptyStack<T, TStackBeneath>>(x, this);
    }

    INonEmptyStack<T, IStack<T>> IStack<T>.Push(T x)
    {
        return Push(x);
    }
}

Update #2: Benchmark code and results

I used the following code to measure recursive generic type instantiation times for .NET 4 on a Windows 7 SP 1 x64 (Intel U4100 @ 1.3 GHz, 4 GB RAM) notebook. This is a different, faster machine than the one I originally used, so the results do not match with the statements above.

Console.WriteLine("N, t [ms]");
int outerN = 0;
while (true)
{
    outerN++;
    var appDomain = AppDomain.CreateDomain(outerN.ToString());
    appDomain.SetData("n", outerN);
    appDomain.DoCallBack(delegate {
        int n = (int)AppDomain.CurrentDomain.GetData("n");
        var stopwatch = new Stopwatch();
        stopwatch.Start();
        IStack<int> s = new EmptyStack<int>();
        for (int i = 0; i < n; ++i)
        {
            s = s.Push(i);  // <-- this "creates" a new type
        }
        stopwatch.Stop();
        long ms = stopwatch.ElapsedMilliseconds;
        Console.WriteLine("{0}, {1}", n, ms);
    });
    AppDomain.Unload(appDomain);
}

(Each measurement is taken in a separate app domain because this ensures that all runtime types will have to be re-created in each loop iteration.)

Here's a X-Y plot of the output:

Line chart showing a measurement for recursive generic type instantiation times

  • Horizontal axis: N denotes the depth of type recursion, i.e.:

    • N = 1 indicates a NonEmptyStack<EmptyStack<T>>
    • N = 2 indicates a NonEmptyStack<NonEmptyStack<EmptyStack<T>>>
    • etc.
  • Vertical axis: t is the time (in milliseconds) required to push N integers onto a stack. (The time needed to create runtime types, if that actually happens, is included in this measurement.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

变身佩奇 2024-12-06 23:22:10

访问新类型会导致运行时将其从 IL 重新编译为本机代码(x86 等)。运行时还会优化代码,这也会针对值类型和引用类型产生不同的结果。

并且 List 的优化方式显然与 List> 不同。

因此,EmptyStackNonEmptyStack> 等也将被视为完全不同的类型,并且都将被“重新编译”和优化。
(据我所知!)

通过嵌套更多层,结果类型的复杂性会增加,并且优化需要更长的时间。

因此,添加一层需要 1 个步骤来重新编译和优化,下一层需要 2 个步骤加上第一步(左右),第三层需要 1 + 2 + 3 个步骤等。

Accessing a new type causes the runtime to recompile it from IL to native code (x86 etc). The runtime also optimizes the code, which will also produce different results for value types and reference types.

And List<int> clearly will be optimized differently than List<List<int>>.

Thus also EmptyStack<int> and NonEmptyStack<int, EmptyStack<int>> and so on will be handled as completely different types and will all be 'recompiled' and optimized.
(As far as I know!)

By nesting further layers the complexity of the resulting type grows and the optimization takes longer.

So adding one layer takes 1 step to recompile and optimize, the next layer takes 2 steps plus the first step (or so) and the 3rd layer takes 1 + 2 + 3 steps etc.

扛起拖把扫天下 2024-12-06 23:22:10

如果 James 和其他人对于运行时创建的类型的看法是正确的,那么性能就会受到类型创建速度的限制。那么,为什么类型创建的速度呈指数级缓慢?我认为,根据定义,类型彼此不同。因此,每一种类型都会导致一系列日益不同的内存分配和释放模式。速度仅受 GC 自动管理内存的效率的限制。有一些激进的序列会减慢任何内存管理器的速度,无论它有多好。 GC 和分配器将花费越来越多的时间为每个下一个分配和大小寻找最佳大小的可用内存块。

答案:

因为,你发现了一个非常激进的序列,它的内存碎片如此严重且如此之快,以至于 GC 根本不会感到困惑。

人们可以从中学到的是:真正快速的现实世界应用程序(例如:算法股票交易应用程序)是具有静态数据结构的非常简单的直接代码,仅为整个应用程序运行分配一次。

If James and other people are correct about types being created in runtime, then performance is limited by speed of types creation. So, why speed of types creation is exponentially slow ? I think, that by definition, types are different to each other. Consequently, every next type causes series of increasingly different memory allocation and deallocation patterns. The speed is simply limited by how efficient is automatic managing of memory by a GC. There are some agressive sequencies, which will slow down any memory manager, no matter how good it is. GC and allocator will spend more and more time looking for optimally sized pieces of free memory for every next allocation and size.

Answer:

Because, you found one very agressive sequence, which fragments memory so bad and so fast, that GC is confused to no means.

What one can learn from it, is that: really fast real world apps (for example: Algorithmic Stock Trading apps) are very plain pieces of straight code with static data structures, allocated once only for the whole run of application.

尘曦 2024-12-06 23:22:10

在 Java 中,计算时间似乎比线性要好一点,并且比您在 .net 中报告的效率要高得多。使用我的答案中的testRandomPopper方法,在N=10,000,000的情况下运行大约需要4秒N=20,000,000 时运行约 10 秒

In Java, computation time appears to be a little more than linear and far more efficient than you're reporting in .net. Using the testRandomPopper method from my answer, it takes ~4 seconds to run with N=10,000,000 and ~10 seconds to run with N=20,000,000

尘世孤行 2024-12-06 23:22:10

是否迫切需要区分空堆栈和非空堆栈?

从实际的角度来看,如果没有完全限定类型,并且在添加 1,000 个值之后,您就无法弹出任意堆栈的值,这是一个非常长的类型名称。

为什么不这样做:

public interface IImmutableStack<T>
{
    T Top { get; }
    IImmutableStack<T> Pop { get; }
    IImmutableStack<T> Push(T x);
}

public class ImmutableStack<T> : IImmutableStack<T>
{
    private ImmutableStack(T top, IImmutableStack<T> pop)
    {
        this.Top = top;
        this.Pop = pop;
    }

    public T Top { get; private set; }
    public IImmutableStack<T> Pop { get; private set; }

    public static IImmutableStack<T> Push(T x)
    {
        return new ImmutableStack<T>(x, null);
    }

    IImmutableStack<T> IImmutableStack<T>.Push(T x)
    {
        return new ImmutableStack<T>(x, this);
    }
}

您可以传递任何 IImmutableStack 并且只需要检查 Pop == null 就知道您已经到达了末尾堆。

否则,这将具有您尝试编码的语义,而不会影响性能。我使用此代码在 1.873 秒内创建了一个包含 10,000,000 个值的堆栈。

Is there a desperate need to have a distinction between the empty stack and the non-empty stack?

From a practical point of view you can't pop the value of an arbitrary stack without fully qualifying the type and after adding 1,000 values that's an insanely long type name.

Why not just do this:

public interface IImmutableStack<T>
{
    T Top { get; }
    IImmutableStack<T> Pop { get; }
    IImmutableStack<T> Push(T x);
}

public class ImmutableStack<T> : IImmutableStack<T>
{
    private ImmutableStack(T top, IImmutableStack<T> pop)
    {
        this.Top = top;
        this.Pop = pop;
    }

    public T Top { get; private set; }
    public IImmutableStack<T> Pop { get; private set; }

    public static IImmutableStack<T> Push(T x)
    {
        return new ImmutableStack<T>(x, null);
    }

    IImmutableStack<T> IImmutableStack<T>.Push(T x)
    {
        return new ImmutableStack<T>(x, this);
    }
}

You can pass around any IImmutableStack<T> and you only need to check for Pop == null to know you've hit the end of the stack.

Otherwise this has the semantics you're trying to code without the performance penalty. I created a stack with 10,000,000 values in 1.873 seconds with this code.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文