什么是“大多数并发垃圾收集器”?

发布于 2024-11-16 14:28:59 字数 102 浏览 5 评论 0原文

我知道停止世界、增量、并行、并发、(软/硬)实时垃圾收集器的概念。但我无法理解大部分并发 GC。和并发GC有什么不同吗?有什么区别?为什么它被称为大部分

I know concepts of stop-the-world, incremental, parallel, concurrent, (soft/hard) realtime garbage collectors. But I can't understanding mostly-concurrent GC. Is it different with concurrent GC? What's the difference? Why it is called mostly?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

永言不败 2024-11-23 14:28:59

我知道停止世界、增量、并行、并发、(软/硬)实时垃圾收集器的概念。但我无法理解大多数并发GC。和并发GC有什么不同吗?有什么区别?为什么叫“大多数”?

与许多其他主题一样,垃圾收集笼罩在术语模糊的迷雾中。伯姆因以非传统方式使用传统术语而臭名昭著,但我们应该原谅他,因为他在传统含义尚未僵化的时代开创了该领域! :-)

据我了解,stop-the-world GC 是指在 GC 周期的整个持续时间内(例如标记整个堆时)挂起所有变异器线程的算法。例如,.NET Server GC 会执行此操作,并因此导致 300 毫秒的巨大暂停时间。增量GC 在每个次要GC 周期执行一点主要GC 工作,例如OCaml GC 中的“主要切片”。并行意味着GC使用多个线程来加速垃圾收集过程。并发GC是指GC与mutators同时运行,例如.NET工作站GC。实时很难定义,最初意味着有界的最大暂停时间,但现在也意味着最小变异器利用率(MMU),以避免GC从不允许变异器运行而长时间暂停变异器的病态问题!根据 Richard Jones 的新书,即时 GC 一次不会挂起多个变体(即不存在停止世界阶段),尽管我怀疑他的意思是变体是彼此独立挂起的。最后,大多数并发 GC 会同时挂起所有赋值器线程,但仅挂起很短的时间,而不是任意长的 GC 周期。因此,在 GC 运行时,变异器大部分时间都可以自由运行,因此被称为“大部分并发”GC。

“大部分并发”的分类很重要,因为大多数(全部?)主要 GC 都属于这一类,因为它在暂停时间和吞吐量之间提供了良好的权衡。例如,.NET 工作站 GC 在拍摄全局根快照时会挂起所有变异器线程,但在标记和清除时会恢复它们。

I know concepts of stop-the-world, incremental, parallel, concurrent, (soft/hard) realtime garbage collectors. But I can't understanding mostly-concurrent GC. Is it different with concurrent GC? What's the difference? Why it is called mostly?

Like so many other subjects, garbage collection is shrouded in a fog of terminological ambiguity. Boehm is particularly infamous for using conventional terms in unconventional ways but we should forgive him because he was pioneering the field at a time when the conventional meanings had not yet been ossified! :-)

As I understand it, stop-the-world GC refers to an algorithm that suspends all mutator threads for the entire duration of a GC cycle, e.g. when marking the entire heap. For example, the .NET Server GC does this and incurs huge 300ms pause times as a consequence. Incremental GCs perform a little bit of major GC work at each minor GC cycle, e.g. "major slices" in OCaml's GC. Parallel means the GC uses multiple threads to speedup the process of collecting garbage. Concurrent GC means the GC runs at the same time as the mutators, e.g. .NET workstation GC. Real-time is difficult to define, originally meant bounded maximum pause times but now also means minimum mutator utilization (MMU), to avoid the pathological problem of a GC that never pauses a mutator for long by never allowing it to run! According to Richard Jones' new book, an on-the-fly GC never suspends more than one mutator at a time (i.e. there is no stop-the-world phase) although I suspect he meant that mutators are suspended independently of each other. Finally, a mostly-concurrent GC is one that does suspend all mutator threads simultaneously but only for a short period of time and not for an arbitrarily-long GC cycle. Therefore, the mutators are allowed to run freely most of the time while the GC is running and, hence, it is called "mostly concurrent" GC.

The classification of "mostly concurrent" is important because most (all?) major GCs fall into this category because it provides a good trade-off between pause times and throughput. For example, the .NET workstation GC suspends all mutator threads when taking a snapshot of the global roots but resumes them while it marks and sweeps.

情定在深秋 2024-11-23 14:28:59

您可以在论文“大部分并行垃圾”中找到易于理解的描述Collection”,作者:Bohem、Demers 和 Shenker(ACM SIGPLAN '91 编程语言设计和实现会议记录,SIGPLANNotices 26, 6(1991 年 6 月),第 157-164 页)。

他们写道:

假设我们能够维护一组虚拟脏位,它们是
每当相应的虚拟内存页面自动设置
被写到。 (此功能的可接受的实现可以是
通过写保护页并捕获结果写入来获得
故障,无需修改底层操作系统内核;一个
在操作系统内核中的实现当然会更有效。)
对于任何为 stop-the-world 操作定义的跟踪收集器,
考虑以下收集算法。年初时
集合,清除所有虚拟脏位。执行传统
与变元并行的跟踪操作。虚拟脏位
将更新以反映变异器写入。追踪完成后
完成,停止世界并追踪所有标记的物体
脏页。 (寄存器被认为是脏的。)此时,所有
标记了可到达的对象,并且可以安全地回收垃圾。

...

在此算法中,并行跟踪阶段提供了
真实可达集的近似值。唯一未标记的对象
这个确实可达的并行跟踪过程必须是
可从自创建以来已写入的标记对象中访问
追踪到。停止世界追踪阶段追踪所有此类对象,
因此最终没有真正可到达的对象未被标记。

当他们提到跟踪垃圾收集器时,他们指的是启动的收集器从指定的“根节点”(通常是程序的寄存器)开始,并跟随指向每个可到达对象的指针。一切无法到达的东西都是垃圾。

简而言之,大部分并行的收集器并行完成大部分工作,然后停止程序的执行以更正收集器运行时程序所做的任何更改。因此,它是“大部分是并行的”。

You can find an accessible description in the paper "Mostly Parallel Garbage Collection" by Bohem, Demers, and Shenker (Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, SIGPLAN Notices 26, 6 (June 1991), pp. 157-164).

They write:

Assume we are able to maintain a set of virtual dirty bits, which are
automatically set whenever the corresponding pages of virtual memory
are written to. (An acceptable implementation of this feature can be
obtained by write-protecting pages and catching the resulting write
faults, with no modifications to the underlying OS kernel; an
implementation in the OS kernel would of course be more efficient.)
For any tracing collector defined for stop- the-world operation,
consider the following collection algorithm. At the beginning of the
collection, clear all virtual dirty bits. Perform the traditional
tracing operation in parallel with the mutator. The virtual dirty bits
will be updated to reflect mutator writes. After the tracing is
complete, stop the world and trace from all marked objects that lie on
dirty pages. (Registers are considered dirty.) At this point, all
reachable objects are marked, and garbage can safely be reclaimed.

...

In this algorithm, the parallel tracing phase provides an
approximation to the true reachable set. The only objects unmarked by
this parallel tracing process which are indeed reachable must be
reachable from marked objects which have been written since being
traced. The stop-the-world tracing phase traces from all such objects,
so that in the end no truly reachable objects remain unmarked.

When they refer to tracing garbage collectors, they are referring to collectors that start from designated "root nodes" (usually the program's registers) and follow pointers to every reachable object. Everything unreachable is garbage.

In short, a mostly-parallel collector does the bulk of the work in parallel, then halts the program's execution to correct any changes that the program made while the collector was running. Hence, it is "mostly parallel."

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文