GC.仅在第 2 代和第 2 代上收集大对象堆
在我的应用程序中,有一个特定的时间,多个大对象会同时被释放。当时我想专门对大对象堆(LOH)进行垃圾收集。
我知道您不能这样做,您必须调用 GC.Collect(2) ,因为 GC 仅在 LOH 执行第 2 代收集时才会被调用。但是,我在文档中读到,调用 GC.Collect(2)
仍会在第 1 代和第 0 代上运行 GC。
是否可以强制 GC仅 收集第 2 代,而不包括第 1 代或第 0 代?
如果不可能,GC 是否有这样设计的理由?
In my application there is a specific time when a number of large objects are all released at once. At that time I would like to do a garbage collection on specifically the large object heap (LOH).
I'm aware that you cannot do that, you must call GC.Collect(2)
because the GC is only invoked on the LOH when it is doing a generation 2 collection. However, I've read in the documentation that calling GC.Collect(2)
would still run a GC on generations 1 and 0.
Is it possible to force the GC to only collect gen 2, and not include gen 1 or gen 0?
If it is not possible, is there a reason for the GC to be designed that way?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这是不可能的。 GC 的设计使得第 2 代收集始终也收集第 0 代和第 1 代。
编辑:在 GC 开发者博客:
编辑 2:来自同一博客的《Using GC Efficiently》第 1 部分 和第 2 部分显然,与 Gen2 集合相比,Gen0 和 Gen1 集合速度更快,因此对我来说,仅执行 Gen2 不会带来太大的性能优势,这似乎是合理的。可能还有更根本的原因,但我不确定。也许答案就在该博客的一些文章中。
It's not possible. The GC is designed so that a generation 2 collection always also collects generation 0 and 1.
Edit: Found you a source for this on a GC developer's blog:
Edit 2: From the same blog's Using GC Efficiently Part 1 and Part 2 apparently Gen0 and Gen1 collections are fast compared to a Gen2 collection, so that it seems reasonable to me that only doing Gen2 wouldn't be of much performance benefit. There might be a more fundamental reason, but I'm not sure. Maybe the answer is in some article on that blog.
由于所有新分配(大对象除外)始终都进入第 0 代,因此 GC 设计为始终从指定的代及以下代进行收集。当您调用 GC.Collect(2) 时,您是在告诉 GC 从 Gen0、Gen1 和 Gen2 进行收集。
如果您确定正在处理大量大型对象(在分配时大到足以放置在 LOH 上的对象),最好的选择是确保在完成后将它们设置为 null(VB 中为 Nothing)和他们在一起。 LOH 分配尝试变得智能并重用块。例如,如果您在 LOH 上分配了一个 1MB 的对象,然后将其释放并将其设置为 null,那么您将留下一个 1MB 的“漏洞”。下次您在 LOH 上分配任何大小为 1MB 或更小的内容时,它将填充该洞(并继续填充它,直到下一次分配太大而无法容纳剩余空间,此时它将分配一个新块。)
请记住,.NET 中的分代不是物理事物,而是逻辑分离,有助于提高 GC 性能。由于所有新分配都在 Gen0 中,因此它始终是要收集的第一代。每个运行的收集周期,在收集中幸存下来的较低世代中的任何内容都会“提升”到下一个最高世代(直到达到 Gen2)。
在大多数情况下,GC 不需要超出收集 Gen0 的范围。 GC 当前的实现能够同时收集 Gen0 和 Gen1,但在收集 Gen0 或 Gen1 时无法收集 Gen2。 (.NET 4.0 极大地放松了这一限制,并且在大多数情况下,GC 能够收集 Gen2,同时也会收集 Gen0 或 Gen1。)
Since all new allocations (other than for large objects) always go in Gen0, the GC is designed to always collect from the specified generation and below. When you call
GC.Collect(2)
, you are telling the GC to collect from Gen0, Gen1, and Gen2.If you are certain you are dealing with a lot of large objects (objects that at allocation time are large enough to be placed on the LOH) the best option is to ensure that you set them to null (Nothing in VB) when you are done with them. LOH allocation attempts to be smart and reuse blocks. For example, if you allocated a 1MB object on the LOH and then disposed of it and set it to null, you would be left with a 1MB "hole". The next time you allocate anything on the LOH that is 1MB or smaller in size, it will fill in that hole (and keep filling it in until the next allocation is too large to fit in the remaining space, at which point it will allocate a new block.)
Keep in mind that generations in .NET are not physical things, but are logical separations to help increase GC performance. Since all new allocations go in Gen0, that is always the first generation to be collected. Each collection cycle that runs, anything in a lower generation that survives collection is "promoted" to the next highest generation (until in reaches Gen2).
In most cases, the GC doesn't need to go beyond collecting Gen0. The current implementation of the GC is able to collect Gen0 and Gen1 at the same time, but it can't collect Gen2 while Gen0 or Gen1 are being collected. (.NET 4.0 relaxes this constraint a great deal and for the most part, the GC is able to collect Gen2 while Gen0 or Gen1 are also being collected.)
回答“为什么”的问题:从物理上来说,不存在 Gen0、Gen1 或 Gen2 这样的东西。它们都在虚拟地址空间上使用相同的内存块。它们之间的区别实际上只能通过绕过想象的边界限制来实现。
每个(小)对象都是从 Gen0 堆区域分配的。如果在收集之后它仍然存在,它就会“向下”移动到托管堆块的该区域,该区域最终刚刚从垃圾中释放。这是通过压缩堆来完成的。完整收集完成后,Gen1 的新“边界”将设置为那些幸存对象之后的空间。
因此,如果您出去尝试清除 Gen0 和/或 Gen1,则会在堆中打开漏洞,必须通过压缩“完整”堆(甚至 Gen0 中的对象)来关闭这些漏洞。显然这没有任何意义,因为无论如何这些对象中的大多数都是垃圾。移动它们是没有意义的。在(否则压缩)堆上创建和留下大洞是没有意义的。
To answer the question "why": physically, there is no such thing as Gen0 and Gen1 or Gen2. They all use the same memory block(s) on the virtual address space. Distinction between them really is made only virtually by moving around a imaginary border limit.
Every (small) object is allocated from the Gen0 heap area. If - after a collection - it survives, it is moved "downwards" to that area of the managed heap block, which eventually was just freed from garbage. This is done by compacting the heap. After the full collection finishes, the new "border" for Gen1 is set to the space right after those survived objects.
So if you would go out and try just to clear Gen0 and/or Gen1, you would open up holes in the heap which must get closed by compacting the "full" heap - even objects in Gen0. Obviously this would not make any sence, since most of those objects would be garbage anyway. There is no point in moving them around. And no point in creating and leaving large holes on the (otherwise compacting) heap.
每当系统执行特定代的垃圾收集时,它必须检查可能保存对该代任何对象的引用的每个对象。在许多情况下,旧对象只会保存对其他旧对象的引用;如果系统正在执行 Gen0 收集,它可以忽略任何仅保存对 Gen1 和/或 Gen2 对象的引用的对象。同样,如果它正在执行 Gen1 收集,它可以忽略任何仅保存对 Gen2 引用的对象。由于对象的检查和标记占垃圾收集所需时间的很大一部分,因此能够完全跳过较旧的对象意味着可以节省大量时间。
顺便说一句,如果您想知道系统如何“知道”对象是否可能保存对较新对象的引用,系统有特殊的代码来在每个对象的描述符中设置几个位(如果对象被写入)。第一个位在每次垃圾收集时都会重置,如果在下一次垃圾收集时仍然重置,系统将知道它不能包含对 Gen0 对象的任何引用(因为最后一次写入对象时存在的任何对象都没有被写入)先前收集清除的将是 Gen1 或 Gen2)。第二位在每次 Gen1 垃圾回收时重置,如果在下一次 Gen1 垃圾回收时仍重置,系统将知道它不能包含对 Gen0 或 Gen1 对象的任何引用(它保存引用的任何对象现在都是 Gen2) 。请注意,系统不知道也不关心写入对象的信息是否包含 Gen0 或 Gen1 引用。写入未标记对象时所需的陷阱非常昂贵,并且如果每次写入对象时都必须处理它,则会极大地影响性能。为了避免这种情况,每当任何写入发生时,对象都会被标记,以便下一次垃圾收集之前的任何其他写入都可以不间断地进行。
Whenever the system performs a garbage-collection of a particular generation, it must examine every single object that might hold a reference to any object of that generation. In many cases, old objects will only hold references to other old objects; if the system is doing a Gen0 collection it can ignore any objects which only hold references to those of Gen1 and/or Gen2. Likewise if it's doing a Gen1 collection it can ignore any objects which only hold references to Gen2. Since examination and tagging of objects represents a large portion of the time required for garbage collection, being able to skip older objects entirely represents a considerable time savings.
Incidentally, if you're wondering how the system "knows" whether an object might hold references to newer objects, the system has special code to set a couple bits in each object's descriptor if the object is written. The first bit is reset at each garbage collection, and if it's still reset at the next garbage collection the system will know it can't contain any references to Gen0 objects (since any objects that existed when the object was last written and weren't cleared out by the previous collection will be Gen1 or Gen2). The second bit is reset at each Gen1 garbage collection and if it's still reset at the next Gen1 garbage collection, the system will know it can't contain any references to Gen0 or Gen1 objects (any objects to which it holds references are now Gen2). Note that the system doesn't know or care whether the information that was written to an object included a Gen0 or Gen1 reference. The trap required when writing to an untagged object is expensive, and would greatly impede performance if it had to be handled every time an object is written. To avoid this, objects are tagged whenever any write occurs, so that any additional writes before the next garbage-collection can proceed without interruption.