允许迭代而不产生任何垃圾
我在实现 IEnumerable 接口的对象池中有以下代码。
public IEnumerable<T> ActiveNodes
{
get
{
for (int i = 0; i < _pool.Count; i++)
{
if (_pool[i].AvailableInPool)
{
yield return _pool[i];
}
}
}
}
据我所知(根据 this 问题),这将生成垃圾,因为 IEnumerable 对象将需要被收集。 _pool 中的任何元素都不会被收集,因为池的目的是保留对所有元素的引用以防止垃圾创建。
任何人都可以建议一种允许迭代 _pool 以便不生成垃圾的方法吗?
迭代池时,应迭代池中具有 AvailableInPool == true
的所有项目。顺序并不重要。
I have the following code in an object pool that implements the IEnumerable interface.
public IEnumerable<T> ActiveNodes
{
get
{
for (int i = 0; i < _pool.Count; i++)
{
if (_pool[i].AvailableInPool)
{
yield return _pool[i];
}
}
}
}
As far as I know (according to this question), this will generate garbage as the IEnumerable object will need to be collected.
None of the elements in _pool will ever be collected, as the purpose of the pool is to keep references to all of them to prevent garbage creation.
Can anyone suggest a way to allow iteration over _pool so that no garbage is generated?
When iterating over pool, all of the items in pool that have AvailableInPool == true
should be iterated over. Order doesn't matter.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
首先,许多人反驳奥尔霍夫斯基,认为这没什么值得担心的。在某些环境下的某些应用程序中,避免收集压力实际上非常重要。
紧凑框架垃圾收集器有一个不复杂的策略;每次分配 1000KB 内存时,它都会触发一次收集。现在假设您正在编写一个在紧凑框架上运行的游戏,并且物理引擎每次运行时都会生成 1KB 的垃圾。物理引擎通常以每秒 20 次的速度运行。因此,每分钟有 1200KB 的压力,嘿,这已经是物理引擎每分钟不止一次的收集了。如果收集导致游戏出现明显的卡顿,那么这可能是不可接受的。在这种情况下,您可以采取任何措施来降低收集压力。
尽管我在桌面 CLR 上工作,但我自己还是在艰难地学习这一点。我们在编译器中遇到了必须避免收集压力的情况,并且我们正在跳过各种对象池环来做到这一点。奥尔霍夫斯基,我感受到你的痛苦。
那么,回到你的问题,如何在不产生收集压力的情况下迭代池对象的集合?
首先,我们来思考一下为什么在典型场景中会出现收集压力。假设您
逻辑上分配了两个对象。首先,它分配表示节点序列的可枚举值(序列)。其次,它分配表示序列中当前位置的枚举器(游标)。
在实践中,有时您可以稍微作弊并拥有一个既代表序列又代表枚举器的对象,但您仍然分配了一个对象。
怎样才能避免这种收款压力呢?我想到三件事。
1) 首先不要创建 ActiveNodes 方法。让调用者按索引迭代池,并检查自己节点是否可用。序列就是已经分配的池,游标是一个整数,两者都不会产生新的收集压力。您付出的代价是重复的代码。
2)正如 Steven 所建议的,编译器将采用任何具有正确公共方法和属性的类型;它们不必是 IEnumerable 和 IEnumerator。您可以创建自己的可变结构序列和游标对象,按值传递它们,并避免收集压力。具有可变结构是危险的,但这是可能的。请注意,
List
将这种策略用于其枚举器;研究其实施以获取想法。3)正常在堆上分配序列和枚举器并将它们也池化!您已经采用了池策略,因此没有理由不能同时池化枚举器。枚举器甚至有一个方便的“重置”方法,通常只会引发异常,但您可以编写一个自定义枚举器对象,当它返回到池中时,使用它将枚举器重置回序列的开头。
大多数对象一次仅枚举一次,因此在典型情况下池可能很小。
(当然,现在您可能会遇到先有鸡还是先有蛋的问题;您将如何枚举枚举器池?)
First off, a number of people are pushing back on Olhovsky to suggest that this is worrying about nothing. Avoiding collection pressure is actually very important in some applications on some environments.
The compact framework garbage collector has an unsophisticated policy; it triggers a collection every time 1000KB of memory has been allocated. Now suppose you are writing a game that runs on the compact framework, and the physics engine generates 1KB of garbage every time it runs. Physics engines are typically run on the order of 20 times a second. So that's 1200KB of pressure per minute, and hey, that's already more than one collection per minute just from the physics engine. If the collection causes a noticable stutter in the game then that might be unacceptable. In such a scenario, anything you can do to decrease collection pressure helps.
I am learning this myself the hard way, even though I work on the desktop CLR. We have scenarios in the compiler where we must avoid collection pressure, and we are jumping through all kinds of object pooling hoops to do so. Olhovsky, I feel your pain.
So, to come to your question, how can you iterate over the collection of pooled objects without creating collection pressure?
First, let's think about why collection pressure happens in the typical scenario. Suppose you have
Logically this allocates two objects. First, it allocates the enumerable -- the sequence -- that represents the sequence of nodes. Second, it allocates the enumerator -- the cursor -- that represents the current position in the sequence.
In practice sometimes you can cheat a bit and have one object that represents both the sequence and the enumerator, but you still have one object allocated.
How can we avoid this collection pressure? Three things come to mind.
1) Don't make an ActiveNodes method in the first place. Make the caller iterate over the pool by index, and check themselves whether the node is available. The sequence is then the pool, which is already allocated, and the cursor is an integer, neither of which are creating new collection pressure. The price you pay is duplicated code.
2) As Steven suggests, the compiler will take any types that have the right public methods and properties; they don't have to be IEnumerable and IEnumerator. You can make your own mutable-struct sequence and cursor objects, pass those around by value, and avoid collection pressure. It is dangerous to have mutable structs, but it is possible. Note that
List<T>
uses this strategy for its enumerator; study its implementation for ideas.3) Allocate the sequence and the enumerators on the heap normally and pool them too! You're already going with a pooling strategy, so there's no reason why you can't pool an enumerator as well. Enumerators even have a convenient "Reset" method that usually just throws an exception, but you could write a custom enumerator object that used it to reset the enumerator back to the beginning of the sequence when it goes back in the pool.
Most objects are only enumerated once at a time, so the pool can be small in typical cases.
(Now, of course you may have a chicken-and-egg problem here; how are you going to enumerate the pool of enumerators?)
在任何“正常”设计中,迭代项目通常会导致创建新的可枚举对象。创建和处置对象非常快,因此只有在非常特殊的情况下(其中低延迟是最优先考虑的)垃圾收集可能(我说“可能”)是一个问题。
通过返回不实现
IEnumerable
的结构,可以实现没有垃圾的设计。 C# 编译器仍然可以迭代此类对象,因为foreach
语句使用鸭子类型。例如,.NET 的List
就采用了这种方法。在数组和 List上使用
foreach
时,不会生成垃圾。在数组上使用foreach
时,C# 会将操作转换为for
语句,而List
已经实现了struct
枚举器,导致foreach
不产生垃圾。这是一个 struct enumerable 和 struct enumerator。当您返回可枚举时,C# 编译器可以对其进行 foreach:
这是
StructEnumerator
:您可以简单地返回
StructEnumerable
,如下所示:并且 C# 可以迭代使用普通的 foreach :
请注意,您无法使用接口,涉及创建枚举器,因此涉及垃圾收集。当然,您可以构建自己的 LINQ 扩展方法,但这不太可能有帮助,因为这通常仍会导致创建新对象(当为使用的委托生成闭包时)。
System.Linq.Enumerable
对项目进行 LINQ 您需要IEnumerable更新(2024):较新的 .NET(Core)版本包含对 LINQ 方法的许多优化,这些优化在某些情况下使其速度非常快,并且在某些情况下不会产生垃圾,特别是当 LINQ 方法用于过滤数组和 List
时。 T>
。这意味着您应该在从 LINQ 操作恢复为手动编写代码之前对代码的性能进行基准测试,因为 LINQ 操作很可能比您自己合理编写的任何操作快几个数量级。Iterating items will in any 'normal' design usually result in the creation of a new enumerable object. Creating and disposing objects is very fast, so only in very special scenarios (where low latency is the top most priority) garbage collections could (I say 'could') be a problem.
A design without garbage is possible by returning structures that don't implement
IEnumerable
. The C# compiler can still iterate such objects, because theforeach
statement uses duck typing. .NET'sList<T>
, for instance, takes this approach.When using
foreach
over both an array andList<T>
, no garbage will be generated. When usingforeach
on an array, C# will transform the operation to afor
statement, whileList<T>
already implements astruct
enumerator, causing theforeach
to produce no garbage.Here is a struct enumerable and struct enumerator. When you return the enumerable, the C# compiler can foreach over it:
Here is the
StructEnumerator
:You can simply return the
StructEnumerable<T>
as follows:And C# can iterate over this with a normal foreach:
Note that you can't LINQ over the item using
System.Linq.Enumerable
You need theIEnumerable<T>
interface for that, and that involves creating enumerators and, therefore, garbage collection. You could, of course, build your own LINQ extension methods, but that will unlikely help, because that will often still result in new objects being created (when closures are being generated for used delegates).UPDATE (2024): Newer .NET (Core) versions contain many optimizations to the LINQ methods that in some cases make it extraordinary fast and in some cases produce no garbage, especially when LINQ methods are used to filter array and
List<T>
. This means that you should benchmark the performance of your code before reverting from LINQ operations to manual written code, because the LINQ operations might very well be orders of magnitude faster than anything you can reasonably write yourself.由于 XNA for XBox 也可以在 Compact Framework 上运行(根据您给出的提示,我怀疑这就是您正在研究的内容 (1)),因此我们可以 相信 XNA 开发人员 能够准确地教我们 foreach 何时产生垃圾。
引用最相关的观点(尽管整篇文章值得一读):
因此,如果 _pool 是一个
List
、数组或类似对象并且可以承受,您可以直接返回该类型或将IEnumerable
转换为相应的类型以避免foreach 期间的垃圾。作为一些补充阅读,
(1) 每秒 60 次调用,Compact Framework,无法深入到本机代码,触发 GC 之前分配 1MB。
Since XNA for XBox also works over the Compact Framework (and I suspect that's what you're working on given the hints you've given(1)), we can trust the XNA devs to teach us exactly when foreach creates garbage.
To quote the most relevant point (although the entire article's worth reading):
So, if _pool is a
List
, array or similar and can afford to, you can either return that type directly or cast theIEnumerable<T>
to the respective type to avoid garbage during the foreach.As some additional reading, Shawn Hargreaves can have some useful additional information.
(1) 60 calls per second, Compact Framework, can't go down to native code, 1MB of allocation before a GC is triggered.
它必须是 IEnumerable 吗?使用旧的索引 acecss 重构数组会有帮助吗?
Does it have to be IEnumerable? Will refactoring to an array with good old indexed acecss help?
不要害怕那些微小的垃圾。池中的 ActiveNodes 将(应该)更加昂贵。因此,如果您不再重新创建它们,那就足够了。
@编辑:如果您被迫使用托管平台并且确实想要实现零垃圾状态,请放弃在 foreach 循环中使用池并以另一种方式(可能使用索引器)对其进行迭代。或者考虑创建一个潜在节点列表并返回该列表。
@Edit2:当然,使用 Current()、Next() 等实现 IEnumerator 也可以。
Dont be afraid of those tiny garbage objects. The ActiveNodes in the pool will (should) be much more costly. Therefore, if you get rid of recreating them it should be sufficient.
@Edit: if you are made to use a managed platform and really want to archieve a zero-garbage state, disclaim the usage of the pool in a foreach loop and iterate over it in another manner, possibly utilizing an indexer. Or consider creating a list of potential nodes and return that instead.
@Edit2: Of course, implementing IEnumerator und using Current(), Next() and so on, would work as well.
您可以实现自己的 IEnumerator 类,它将枚举这些活动节点。
现在,如果您可以保证在任何时候只有一个客户端会枚举活动节点,那么您的类就可以缓存此类,以便只存在一个实例。这样就不需要收集垃圾了。对 ActiveNodes 的调用将调用 Reset 以从头开始枚举。
这是一个危险的假设,但如果您正在优化,您可以考虑它
如果您有多个客户端随时枚举这些节点,那么每个客户端都需要自己的 IEnumerator 实例来能够将当前光标位置存储在集合中。在这种情况下,需要在每次调用时创建和收集这些数据 - 您也可以坚持原来的设计。
You could implement your own IEnumerator class which will enumerate over these active nodes.
Now if you can guarantee that only one client will be enumerating over the active nodes at any one time, your class can cache this class so only a single instance exists. Then no garbage will need collecting. The call to ActiveNodes will call reset to start enumerating from the start.
This is a dangerous assumption to make, but if you are optimising you may be able to consider it
If you have multiple clients enumerating over these nodes at any time, then each will need its own instance of IEnumerator to be able to store their current cursor position in the collection. In which case these will need to be created and collected with each call - and you may as well stick with your original design.
此外,您还可以拥有一个预先分配的枚举器池。考虑一下您想要支持多少个同时枚举。
垃圾收集开销将会消失,但代价是额外的内存消耗。最纯粹的速度与内存优化困境。
Also, you can have a pool of preallocated enumerators. Think about how many simultaneous enumerations you want to support.
The garbage collection overhead will go, at the expense of extra memory consumption. Speed vs. memory optimization dilemma in its purest form.