这种异步垃圾收集器可能存在的缺陷
我一直在研究一些关于垃圾收集的知识,主要应用于服务器端/实时应用程序,并且我已经开始草拟一种算法,通过该算法可以拥有异步垃圾收集系统。由于我现在开始讨论这个主题,所以我对 gc 算法不太了解,我想知道这样的实现可能存在的陷阱。该算法非常粗糙,并且有许多未定义的部分。
我的想法是这样的:
- 每个线程都有自己的堆空间来管理,并存储它拥有的由其他线程使用的指针列表 这样,垃圾收集的工作方式与正在运行的线程完全异步,并且:
- 第 1 阶段开始跟踪线程的根并标记它们可访问的所有对象。如果我们进入另一个线程的空间,我们停止跟踪该指针,并
- 在标记所有区域后在所有者线程上将该指针标记为“正在使用”,我们选择一个区域(可能有尽可能多的死引用),然后开始将其活动对象引用复制到另一个空间。 (它可能是整个线程堆空间,但我认为此操作可能过于占用内存)
- 复制从使用 CAS 设置一个标志开始,该标志表明对象正在被复制。设置该标志时对此特定对象执行的任何可变操作都将自旋锁,直到 gc 线程设置新地址为止。当复制完成时,在旧地址上设置一个新地址,并且
- 在使用 CAS 更新对这些指针的所有引用后,对该对象执行的任何可变引用都将被路由到新对象,旧空间最终被释放(无新的指针将使用错误的地址进行更新,因为每个变元都会首先检查引用是否已更改位置)
就是这样!
不管怎样,我对一种不会停止世界的可能实现感到非常兴奋,并且只使用仅适用于正在复制的对象的快速自旋锁。但我想知道这是否可以实现,或者是否有可能在某处存在悬空指针,或者内存泄漏,或者它是无效的,等等。任何有助于此的信息将不胜感激!
我不太确定它如何处理来自不同线程的循环引用。我认为这会很自然地处理,因为我们更新了当前 gc 线程拥有的所有危险指针。 可能还有某种我没有考虑到的并发访问。
谢谢你!
- - 编辑: 感谢 Ernest 的贡献,我正在考虑不使用复制算法,而可能使用简单的标记和标记。扫。这是因为我们每次访问对象的变量时都需要检查指针是否已更新。在我看来,这是一个相当大的开销。不是吗?
I've been studying a little about garbage collection, mostly applied to server-side / real-time applications, and I've started to sketch an algorithm with which it would be possible to have an asynchronous garbage collection system. Since I'm starting on this topic now, so I don't know very deeply about gc algorithms, I was wondering about the possible pitfalls of an implementation like that. The algorithm is very crude, and with many undefined parts.
Here's how I thought about it:
- each thread has its own heap space to manage, and stores a list of pointers it owns that are in use by other threads
With that, garbage collection works totally asynchronous to the running threads, and: - phase 1 start following the threads' roots and marking all objects reacheable by them. If we get into another thread's space, we stop following this pointer, and mark this pointer as "in use" on the owner thread
- after we have marked all regions, we select a region (maybe with most dead references as possible), and start copying its live object references to another space. (it might be the whole thread heap space, but I think it could be too memory-intensive this operation)
- copying starts with setting with CAS a flag which states that the object is being copied. Any mutable action to be performed on this particular object while that flag is set will spin lock until a new address is set by the gc thread. When copying finishes, a new address is set on the old one, and any mutable reference to be performed on the object will be routed to the new object
- after updating all references made to those pointers using CAS, the old space is finally freed (no new pointers will be updated with the wrong address, since every mutator will first check to see if the reference has changed locations)
That's it!
Anyway, I'm quite excited with a possible implementation that doesn't stops-the-world, and using only fast spin-locks that apply only to the object being copied. But I'd like to know if this is possible to implement, or if there is any possibility of having a dangling pointer somewhere, or memory leaks, or it it's inneficient, etc. Any info that will contribute to this will be greatly appreciated!
I'm not at all too sure about how e.g. it would handle circular references from different threads. I think this would be handled naturally since we update all hazard pointers the current gc'ed thread has.
There might also be some kind of concurrent access I wasn't considering.
Thank you!
---- EDIT:
Thanks to Ernest's contribution, I'm thinking about not using a copying algorithm, but maybe a simple mark & sweep. That's because we'd need to check every time we access an object's variable if the pointer has been updated. This seems to me a quite big overhead. Doesn't it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
你的想法很好,但我相信有很多微妙的问题需要大量的工作来解决,一旦解决,你将无法获得有竞争力的吞吐量性能。
如果存在共享对象,则哪个线程拥有它?如果并发集合有从不同线程添加到其中的对象,是否会有很多堆间指针?如何处理堆之间的循环?
如果这是与变元运行同时完成的,如何防止变元将拓扑更改为的竞争条件当 GC 进行此标记时引入或消除堆间引用?
如果这是针对多核的,复制将使全局内存带宽饱和并破坏可扩展性。
。无锁是很棘手的。您正在对内存模型做出假设。您可能需要记忆障碍。看起来您只是在模拟锁,在这种情况下,您正在讨论在每次将引用写入堆时获取和释放锁,这将非常昂贵。
自旋锁也不利于并发,因为你不仅占用了一个线程,而且还烧毁了一个无法再完成你正在等待的工作的核心!请参阅 GHC 中的最后一个核心减速错误。无等待解决方案将使用 CAS 获取 mutator 线程来帮助 GC 工作,而不是阻塞等待另一个线程来完成它。
好的。
Your ideas are good but I believe there are lots of subtle problems that will require a lot of work to solve and, once solved, you won't be able to attain competitive throughput performance.
If there is a shared object, which thread owns it? If a concurrent collection has objects added to it from different threads, will there be many inter-heap pointers? How do you handle cycles between heaps?
If this is done concurrently with the mutator running, how do you prevent race conditions where the mutators alter the topology to introduce or eliminate inter-heap references while the GC is doing this marking?
If this is for multicore, copying will saturate global memory bandwidth and destroy scalability.
Lock-free is tricky. You're making assumptions about memory models. You may need memory barriers. Looks like you're just emulating a lock, in which case you're talking about acquiring and releasing a lock around every write of a reference into the heap which would be prohibitively expensive.
Spin locks are also bad for concurrency because you're not only tying up a thread but also burning a core that can no longer do the work that you are waiting on! See the last core slowdown bug in GHC. A wait free solution would use CAS to get the mutator thread to help with the GC work rather than block waiting for another thread to do it.
Ok.
我看到的主要问题是同步。您需要内存屏障来确保各个线程能够看到彼此堆上的最新数据,并且我不知道这些数据会去哪里并仍然保持完全异步操作模型。
The major issue I see is synchronization. You need memory barriers to ensure that the various threads see up-to-date data on each other's heaps, and I don't see really where those would go and still maintain your fully asynchronous operational model.
最近刚刚看到这个 http://java-monitor.com/forum/showthread.php ?t=890
据我了解,您正在谈论类似于 Erlang VM 使用的模型 - 每个线程都有自己的堆。 Erlang 的本质是可能的——不需要自旋锁(至少对于线程堆来说)。
Just saw this recently http://java-monitor.com/forum/showthread.php?t=890
As far as I understand you are talking about model similar to what is used by Erlang VM - each threads has its own heap. It is possible by Erlang nature - no spin locks are required (at least for the thread heap).