注意:我可能在标题中选择了错误的单词;也许我在这里真正谈论的是多项式增长。请参阅本问题末尾的基准测试结果。
让我们从这三个代表不可变堆栈的递归通用接口†开始:
interface IStack<T>
{
INonEmptyStack<T, IStack<T>> Push(T x);
}
interface IEmptyStack<T> : IStack<T>
{
new INonEmptyStack<T, IEmptyStack<T>> Push(T x);
}
interface INonEmptyStack<T, out TStackBeneath> : IStack<T>
where TStackBeneath : IStack<T>
{
T Top { get; }
TStackBeneath Pop();
new INonEmptyStack<T, INonEmptyStack<T, TStackBeneath>> Push(T x);
}
我创建了简单的实现<代码>EmptyStack、NonEmptyStack
。
更新#1:请参阅下面的代码。
我注意到有关其运行时性能的以下事项:
- 将 1,000 个项目推送到
EmptyStack
第一次需要7秒以上。
- 之后将 1,000 个项目推送到
EmptyStack
上几乎不需要任何时间。
- 推入堆栈的项目越多,性能就会呈指数级下降。
更新#2:
前两个事实让我怀疑性能缓慢不是由于我的实现,而是由于类型系统:.NET 必须实例化 1,000 个不同的封闭泛型 >类型,即:
-
EmptyStack
-
NonEmptyStack>
-
NonEmptyStack>>
-
NonEmptyStack>>
NonEmptyStack>>代码>
- 等
问题:
- 我的上述评估正确吗?
- 如果是这样,为什么诸如
T
、T>
、T 之类的泛型类型的实例化会失败? >>
,依此类推,嵌套得越深,速度就越慢?
- .NET 之外的 CLR 实现(Mono、Silverlight、.NET Compact 等)是否具有相同的特征?
†) 题外脚注:顺便说一句,这些类型非常有趣。因为它们允许编译器捕获某些错误,例如:
stack.Push(item).Pop().Pop();
// ^^^^^^
// 如果不知道“stack”非空,则会导致编译时错误。
或者您可以表达对某些堆栈操作的要求:
TStackBeneath PopTwoItems;
(INonEmptyStack 堆栈)
更新#1:上述接口的实现
internal class EmptyStack<T> : IEmptyStack<T>
{
public INonEmptyStack<T, IEmptyStack<T>> Push(T x)
{
return new NonEmptyStack<T, IEmptyStack<T>>(x, this);
}
INonEmptyStack<T, IStack<T>> IStack<T>.Push(T x)
{
return Push(x);
}
}
// ^ this could be made into a singleton per type T
internal class NonEmptyStack<T, TStackBeneath> : INonEmptyStack<T, TStackBeneath>
where TStackBeneath : IStack<T>
{
private readonly T top;
private readonly TStackBeneath stackBeneathTop;
public NonEmptyStack(T top, TStackBeneath stackBeneathTop)
{
this.top = top;
this.stackBeneathTop = stackBeneathTop;
}
public T Top { get { return top; } }
public TStackBeneath Pop()
{
return stackBeneathTop;
}
public INonEmptyStack<T, INonEmptyStack<T, TStackBeneath>> Push(T x)
{
return new NonEmptyStack<T, INonEmptyStack<T, TStackBeneath>>(x, this);
}
INonEmptyStack<T, IStack<T>> IStack<T>.Push(T x)
{
return Push(x);
}
}
更新#2:基准代码和结果
我使用以下代码在 Windows 7 SP 1 x64(Intel U4100 @ 1.3 GHz,4 GB RAM)笔记本上测量 .NET 4 的递归通用类型实例化时间。这是一台与我最初使用的机器不同、速度更快的机器,因此结果与上面的陈述不符。
Console.WriteLine("N, t [ms]");
int outerN = 0;
while (true)
{
outerN++;
var appDomain = AppDomain.CreateDomain(outerN.ToString());
appDomain.SetData("n", outerN);
appDomain.DoCallBack(delegate {
int n = (int)AppDomain.CurrentDomain.GetData("n");
var stopwatch = new Stopwatch();
stopwatch.Start();
IStack<int> s = new EmptyStack<int>();
for (int i = 0; i < n; ++i)
{
s = s.Push(i); // <-- this "creates" a new type
}
stopwatch.Stop();
long ms = stopwatch.ElapsedMilliseconds;
Console.WriteLine("{0}, {1}", n, ms);
});
AppDomain.Unload(appDomain);
}
(每次测量都是在单独的应用程序域中进行的,因为这可以确保必须在每次循环迭代中重新创建所有运行时类型。)
这是输出的 XY 图:
Note: I may have chosen the wrong word in the title; perhaps I'm really talking about polynomial growth here. See the benchmark result at the end of this question.
Let's start with these three recursive generic interfaces† that represent immutable stacks:
interface IStack<T>
{
INonEmptyStack<T, IStack<T>> Push(T x);
}
interface IEmptyStack<T> : IStack<T>
{
new INonEmptyStack<T, IEmptyStack<T>> Push(T x);
}
interface INonEmptyStack<T, out TStackBeneath> : IStack<T>
where TStackBeneath : IStack<T>
{
T Top { get; }
TStackBeneath Pop();
new INonEmptyStack<T, INonEmptyStack<T, TStackBeneath>> Push(T x);
}
I've created straightforward implementations EmptyStack<T>
, NonEmptyStack<T,TStackBeneath>
.
Update #1: See the code below.
I've noticed the following things about their runtime performance:
- Pushing 1,000 items onto an
EmptyStack<int>
for the first time takes more than 7 seconds.
- Pushing 1,000 items onto an
EmptyStack<int>
takes virtually no time at all afterwards.
- Performance gets exponentially worse the more items I push onto the stack.
Update #2:
-
I've finally performed a more precise measurement. See the benchmark code and results below.
-
I've only discovered during these tests that .NET 3.5 doesn't seem to allow generic types with a recursion depth ≥ 100. .NET 4 doesn't seem to have this restriction.
The first two facts make me suspect that the slow performance is not due to my implementation, but rather to the type system: .NET has to instantiate 1,000 distinct closed generic types, ie.:
EmptyStack<int>
NonEmptyStack<int, EmptyStack<int>>
NonEmptyStack<int, NonEmptyStack<int, EmptyStack<int>>>
NonEmptyStack<int, NonEmptyStack<int, NonEmptyStack<int, EmptyStack<int>>>>
- etc.
Questions:
- Is my above assessment correct?
- If so, why does instantiation of generic types such as
T<U>
, T<T<U>>
, T<T<T<U>>>
, and so on get exponentially slower the deeper they are nested?
- Are CLR implementations other than .NET (Mono, Silverlight, .NET Compact etc.) known to exhibit the same characteristics?
†) Off-topic footnote: These types are quite interesting btw. because they allow the compiler to catch certain errors such as:
stack.Push(item).Pop().Pop();
// ^^^^^^
// causes compile-time error if 'stack' is not known to be non-empty.
Or you can express requirements for certain stack operations:
TStackBeneath PopTwoItems<T, TStackBeneath>
(INonEmptyStack<T, INonEmptyStack<T, TStackBeneath> stack)
Update #1: Implementation of the above interfaces
internal class EmptyStack<T> : IEmptyStack<T>
{
public INonEmptyStack<T, IEmptyStack<T>> Push(T x)
{
return new NonEmptyStack<T, IEmptyStack<T>>(x, this);
}
INonEmptyStack<T, IStack<T>> IStack<T>.Push(T x)
{
return Push(x);
}
}
// ^ this could be made into a singleton per type T
internal class NonEmptyStack<T, TStackBeneath> : INonEmptyStack<T, TStackBeneath>
where TStackBeneath : IStack<T>
{
private readonly T top;
private readonly TStackBeneath stackBeneathTop;
public NonEmptyStack(T top, TStackBeneath stackBeneathTop)
{
this.top = top;
this.stackBeneathTop = stackBeneathTop;
}
public T Top { get { return top; } }
public TStackBeneath Pop()
{
return stackBeneathTop;
}
public INonEmptyStack<T, INonEmptyStack<T, TStackBeneath>> Push(T x)
{
return new NonEmptyStack<T, INonEmptyStack<T, TStackBeneath>>(x, this);
}
INonEmptyStack<T, IStack<T>> IStack<T>.Push(T x)
{
return Push(x);
}
}
Update #2: Benchmark code and results
I used the following code to measure recursive generic type instantiation times for .NET 4 on a Windows 7 SP 1 x64 (Intel U4100 @ 1.3 GHz, 4 GB RAM) notebook. This is a different, faster machine than the one I originally used, so the results do not match with the statements above.
Console.WriteLine("N, t [ms]");
int outerN = 0;
while (true)
{
outerN++;
var appDomain = AppDomain.CreateDomain(outerN.ToString());
appDomain.SetData("n", outerN);
appDomain.DoCallBack(delegate {
int n = (int)AppDomain.CurrentDomain.GetData("n");
var stopwatch = new Stopwatch();
stopwatch.Start();
IStack<int> s = new EmptyStack<int>();
for (int i = 0; i < n; ++i)
{
s = s.Push(i); // <-- this "creates" a new type
}
stopwatch.Stop();
long ms = stopwatch.ElapsedMilliseconds;
Console.WriteLine("{0}, {1}", n, ms);
});
AppDomain.Unload(appDomain);
}
(Each measurement is taken in a separate app domain because this ensures that all runtime types will have to be re-created in each loop iteration.)
Here's a X-Y plot of the output:
-
Horizontal axis: N denotes the depth of type recursion, i.e.:
- N = 1 indicates a
NonEmptyStack<EmptyStack<T>>
- N = 2 indicates a
NonEmptyStack<NonEmptyStack<EmptyStack<T>>>
- etc.
-
Vertical axis: t is the time (in milliseconds) required to push N integers onto a stack. (The time needed to create runtime types, if that actually happens, is included in this measurement.)
发布评论
评论(4)
访问新类型会导致运行时将其从 IL 重新编译为本机代码(x86 等)。运行时还会优化代码,这也会针对值类型和引用类型产生不同的结果。
并且
List
的优化方式显然与List
不同。>
因此,
EmptyStack
和NonEmptyStack>
等也将被视为完全不同的类型,并且都将被“重新编译”和优化。(据我所知!)
通过嵌套更多层,结果类型的复杂性会增加,并且优化需要更长的时间。
因此,添加一层需要 1 个步骤来重新编译和优化,下一层需要 2 个步骤加上第一步(左右),第三层需要 1 + 2 + 3 个步骤等。
Accessing a new type causes the runtime to recompile it from IL to native code (x86 etc). The runtime also optimizes the code, which will also produce different results for value types and reference types.
And
List<int>
clearly will be optimized differently thanList<List<int>>
.Thus also
EmptyStack<int>
andNonEmptyStack<int, EmptyStack<int>>
and so on will be handled as completely different types and will all be 'recompiled' and optimized.(As far as I know!)
By nesting further layers the complexity of the resulting type grows and the optimization takes longer.
So adding one layer takes 1 step to recompile and optimize, the next layer takes 2 steps plus the first step (or so) and the 3rd layer takes 1 + 2 + 3 steps etc.
如果 James 和其他人对于运行时创建的类型的看法是正确的,那么性能就会受到类型创建速度的限制。那么,为什么类型创建的速度呈指数级缓慢?我认为,根据定义,类型彼此不同。因此,每一种类型都会导致一系列日益不同的内存分配和释放模式。速度仅受 GC 自动管理内存的效率的限制。有一些激进的序列会减慢任何内存管理器的速度,无论它有多好。 GC 和分配器将花费越来越多的时间为每个下一个分配和大小寻找最佳大小的可用内存块。
答案:
因为,你发现了一个非常激进的序列,它的内存碎片如此严重且如此之快,以至于 GC 根本不会感到困惑。
人们可以从中学到的是:真正快速的现实世界应用程序(例如:算法股票交易应用程序)是具有静态数据结构的非常简单的直接代码,仅为整个应用程序运行分配一次。
If James and other people are correct about types being created in runtime, then performance is limited by speed of types creation. So, why speed of types creation is exponentially slow ? I think, that by definition, types are different to each other. Consequently, every next type causes series of increasingly different memory allocation and deallocation patterns. The speed is simply limited by how efficient is automatic managing of memory by a GC. There are some agressive sequencies, which will slow down any memory manager, no matter how good it is. GC and allocator will spend more and more time looking for optimally sized pieces of free memory for every next allocation and size.
Answer:
Because, you found one very agressive sequence, which fragments memory so bad and so fast, that GC is confused to no means.
What one can learn from it, is that: really fast real world apps (for example: Algorithmic Stock Trading apps) are very plain pieces of straight code with static data structures, allocated once only for the whole run of application.
在 Java 中,计算时间似乎比线性要好一点,并且比您在 .net 中报告的效率要高得多。使用我的答案中的
testRandomPopper
方法,在N=10,000,000的情况下运行大约需要4秒N=20,000,000 时运行约 10 秒In Java, computation time appears to be a little more than linear and far more efficient than you're reporting in .net. Using the
testRandomPopper
method from my answer, it takes ~4 seconds to run with N=10,000,000 and ~10 seconds to run with N=20,000,000是否迫切需要区分空堆栈和非空堆栈?
从实际的角度来看,如果没有完全限定类型,并且在添加 1,000 个值之后,您就无法弹出任意堆栈的值,这是一个非常长的类型名称。
为什么不这样做:
您可以传递任何
IImmutableStack
并且只需要检查Pop == null
就知道您已经到达了末尾堆。否则,这将具有您尝试编码的语义,而不会影响性能。我使用此代码在 1.873 秒内创建了一个包含 10,000,000 个值的堆栈。
Is there a desperate need to have a distinction between the empty stack and the non-empty stack?
From a practical point of view you can't pop the value of an arbitrary stack without fully qualifying the type and after adding 1,000 values that's an insanely long type name.
Why not just do this:
You can pass around any
IImmutableStack<T>
and you only need to check forPop == null
to know you've hit the end of the stack.Otherwise this has the semantics you're trying to code without the performance penalty. I created a stack with 10,000,000 values in 1.873 seconds with this code.