为什么 BCL 集合使用结构枚举器而不是类?
我们都知道可变结构通常是邪恶的。我也很确定,因为 IEnumerable
返回类型 IEnumerator
,结构会立即装箱为引用类型,成本超过如果它们一开始只是引用类型。
那么为什么在 BCL 泛型集合中,所有枚举器都是可变结构呢?肯定有一个很好的理由。我唯一想到的是结构可以轻松复制,从而在任意点保留枚举器状态。但是向 IEnumerator
接口添加一个 Copy()
方法会不会那么麻烦,所以我不认为这本身就是一个逻辑依据。
即使我不同意某个设计决策,我也希望能够理解其背后的原因。
We all know mutable structs are evil in general. I'm also pretty sure that because IEnumerable<T>.GetEnumerator()
returns type IEnumerator<T>
, the structs are immediately boxed into a reference type, costing more than if they were simply reference types to begin with.
So why, in the BCL generic collections, are all the enumerators mutable structs? Surely there had to have been a good reason. The only thing that occurs to me is that structs can be copied easily, thus preserving the enumerator state at an arbitrary point. But adding a Copy()
method to the IEnumerator
interface would have been less troublesome, so I don't see this as being a logical justification on its own.
Even if I don't agree with a design decision, I would like to be able to understand the reasoning behind it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
事实上,这是出于性能原因。 BCL 团队在这一点上进行了大量研究,然后才决定采用您正确指出的可疑且危险的做法:使用可变值类型。
你问为什么这不会引起拳击。这是因为 C# 编译器不会生成代码来将内容装箱到 foreach 循环中的 IEnumerable 或 IEnumerator(如果可以避免的话)!
当我们看到时,
我们要做的第一件事就是检查 c 是否有一个名为 GetEnumerator 的方法。如果是,那么我们检查它返回的类型是否具有 MoveNext 方法和 current 属性。如果是,则完全使用对这些方法和属性的直接调用来生成 foreach 循环。只有当“模式”无法匹配时,我们才会回去找接口。
这有两个理想的效果。
首先,如果集合是一个 int 集合,但是在发明泛型类型之前编写的,那么它不会受到将 Current 的值装箱到对象然后将其拆箱为 int 的装箱惩罚。如果 Current 是一个返回 int 的属性,我们就使用它。
其次,如果枚举数是值类型,则它不会将枚举数装箱到 IEnumerator。
正如我所说,BCL 团队对此进行了大量研究,发现绝大多数情况下,分配和取消分配枚举数的惩罚足够大,值得将其设为一个值类型,尽管这样做可能会导致一些疯狂的错误。
例如,考虑一下:
您完全正确地期望改变 h 的尝试会失败,而且确实如此。编译器检测到您正在尝试更改具有挂起处置的对象的值,并且这样做可能会导致需要处置的对象实际上未被处置。
现在假设您:
这里发生了什么?您可能合理地预期,如果 h 是只读字段,编译器会执行其操作: 制作一个副本,并改变副本,以确保该方法不会丢弃需要处理的值中的内容。
然而,这与我们对这里应该发生的事情的直觉相冲突:
我们期望在 using 块内执行 MoveNext 会将枚举器移动到下一个,无论它是结构体还是引用类型。
不幸的是,现在的 C# 编译器有一个错误。如果您处于这种情况,我们会不一致地选择遵循哪种策略。今天的行为是:
如果通过方法改变的值类型变量是普通局部变量,那么它会正常改变
但如果它是提升局部变量(因为它是匿名函数或迭代器块中的封闭变量),则local实际上是作为只读字段生成的,并且确保副本上发生突变的齿轮将接管。
不幸的是,规范在这个问题上几乎没有提供任何指导。显然,有些东西被破坏了,因为我们做得不一致,但正确的做法是什么却根本不清楚。
Indeed, it is for performance reasons. The BCL team did a lot of research on this point before deciding to go with what you rightly call out as a suspicious and dangerous practice: the use of a mutable value type.
You ask why this doesn't cause boxing. It's because the C# compiler does not generate code to box stuff to IEnumerable or IEnumerator in a foreach loop if it can avoid it!
When we see
the first thing we do is check to see if c has a method called GetEnumerator. If it does, then we check to see whether the type it returns has method MoveNext and property current. If it does, then the foreach loop is generated entirely using direct calls to those methods and properties. Only if "the pattern" cannot be matched do we fall back to looking for the interfaces.
This has two desirable effects.
First, if the collection is, say, a collection of ints, but was written before generic types were invented, then it does not take the boxing penalty of boxing the value of Current to object and then unboxing it to int. If Current is a property that returns an int, we just use it.
Second, if the enumerator is a value type then it does not box the enumerator to IEnumerator.
Like I said, the BCL team did a lot of research on this and discovered that the vast majority of the time, the penalty of allocating and deallocating the enumerator was large enough that it was worth making it a value type, even though doing so can cause some crazy bugs.
For example, consider this:
You would quite rightly expect the attempt to mutate h to fail, and indeed it does. The compiler detects that you are trying to change the value of something that has a pending disposal, and that doing so might cause the object that needs to be disposed to actually not be disposed.
Now suppose you had:
What happens here? You might reasonably expect that the compiler would do what it does if h were a readonly field: make a copy, and mutate the copy in order to ensure that the method does not throw away stuff in the value that needs to be disposed.
However, that conflicts with our intuition about what ought to happen here:
We expect that doing a MoveNext inside a using block will move the enumerator to the next one regardless of whether it is a struct or a ref type.
Unfortunately, the C# compiler today has a bug. If you are in this situation we choose which strategy to follow inconsistently. The behaviour today is:
if the value-typed variable being mutated via a method is a normal local then it is mutated normally
but if it is a hoisted local (because it's a closed-over variable of an anonymous function or in an iterator block) then the local is actually generated as a read-only field, and the gear that ensures that mutations happen on a copy takes over.
Unfortunately the spec provides little guidance on this matter. Clearly something is broken because we're doing it inconsistently, but what the right thing to do is not at all clear.
当编译时已知结构体类型时,结构体方法会被内联,并且通过接口调用方法很慢,所以答案是:由于性能原因。
Struct methods are inlined when type of struct is known at compile time, and calling method via interface is slow, so answer is: because of performance reason.