垃圾收集器如何比显式内存释放更快?
我正在阅读这个 生成的 html,(可能会过期,在这里是原始 ps 文件。)
GC 误区 3:垃圾收集器总是比显式内存释放慢。
GC 误区 4:垃圾收集器总是比显式内存释放更快。
这对我来说真是天大的事。 GC 如何比显式内存释放更快?当它释放内存/使其再次使用时,它本质上不是调用显式内存释放器吗?所以....wtf....这实际上是什么意思?
非常小的物体&大稀疏 堆==> GC通常更便宜, 尤其是线程
我仍然不明白。这就像说 C++ 比机器代码更快(如果你不明白这句话中的 wtf,请停止编程。让 -1 开始)。经过快速谷歌后,一位消息人士建议,当你有大量内存时,速度会更快。我的想法是,这意味着它根本不会打扰免费。当然,这可以很快,我已经编写了一个自定义分配器,它可以完成这件事,在一个不释放任何对象的应用程序中根本不是免费的(void free(void*p){}
)仅在终止时才释放)并且主要在 lib 和 stl 之类的情况下具有定义。所以...我很确定这会让 GC 更快。如果我仍然想要释放,我想我可以使用一个使用双端队列或其自己的实现的分配器,这本质上
if (freeptr < someaddr) {
*freeptr=ptr;
++freeptr;
}
else
{
freestuff();
freeptr = freeptrroot;
}
我确信它会非常快。我已经回答了我的问题。如果 GC 收集器从未被调用,那么它会更快,但是......我确信这不是文档的意思,因为它在测试中提到了两个收集器。我确信,如果 GC 收集器被调用一次,无论使用什么 GC,同样的应用程序都会变慢。如果知道它永远不需要免费,那么可以像我拥有的一个应用程序一样使用一个空的免费主体。
无论如何,我发布这个问题是为了进一步了解。
I was reading this html generated, (may expire, Here is the original ps file.)
GC Myth 3: Garbage collectors are always slower than explicit memory deallocation.
GC Myth 4: Garbage collectors are always faster than explicit memory deallocation.
This was a big WTF for me. How would GC be faster then explicit memory deallocation? isnt it essentially calling a explicit memory deallocator when it frees the memory/make it for use again? so.... wtf.... what does it actually mean?
Very small objects & large sparse
heaps ==> GC is usually cheaper,
especially with threads
I still don't understand it. Its like saying C++ is faster then machine code (if you don't understand the wtf in this sentence please stop programming. Let the -1 begin). After a quick google one source suggested its faster when you have a lot of memory. What i am thinking is it means it doesn't bother will the free at all. Sure that can be fast and i have written a custom allocator that does that very thing, not free at all (void free(void*p){}
) in ONE application that doesnt free any objects (it only frees at end when it terminates) and has the definition mostly in case of libs and something like stl. So... i am pretty sure this will be faster the GC as well. If i still want free-ing i guess i can use an allocator that uses a deque or its own implementation thats essentially
if (freeptr < someaddr) {
*freeptr=ptr;
++freeptr;
}
else
{
freestuff();
freeptr = freeptrroot;
}
which i am sure would be really fast. I sort of answered my question already. The case the GC collector is never called is the case it would be faster but... i am sure that is not what the document means as it mention two collectors in its test. i am sure the very same application would be slower if the GC collector is called even once no matter what GC used. If its known to never need free then an empty free body can be used like that one app i had.
Anyways, i post this question for further insight.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
GC 可以将指针碰撞分配到线程局部生成中,然后依靠复制集合来处理(相对)不常见的撤离幸存者的情况。像
malloc
这样的传统分配器经常竞争全局锁和搜索树。GC 可以通过重置线程本地分配缓冲区来同时释放许多死块,而不是依次在每个块上调用
free
,即 O(1) 而不是 O(n)。通过压缩旧块,使更多块适合每个缓存行。改进的局部性提高了缓存效率。
通过利用额外的静态信息,例如不可变类型。
通过利用额外的动态信息,例如通过写屏障记录的数据改变堆的拓扑。
通过使更高效的技术易于处理,例如,通过消除无等待算法中手动内存管理的麻烦。
通过将释放推迟到更合适的时间或将其卸载到另一个核心。 (感谢 Andrew Hill 提出这个想法!)
GCs can pointer-bump allocate into a thread-local generation and then rely upon copying collection to handle the (relatively) uncommon case of evacuating the survivors. Traditional allocators like
malloc
often compete for global locks and search trees.GCs can deallocate many dead blocks simultaneously by resetting the thread-local allocation buffer instead of calling
free
on each block in turn, i.e. O(1) instead of O(n).By compacting old blocks so more of them fit into each cache line. The improved locality increases cache efficiency.
By taking advantage of extra static information such as immutable types.
By taking advantage of extra dynamic information such as the changing topology of the heap via the data recorded by the write barrier.
By making more efficient techniques tractable, e.g. by removing the headache of manual memory management from wait free algorithms.
By deferring deallocation to a more appropriate time or off-loading it to another core. (thanks to Andrew Hill for this idea!)
使 GC 比显式释放更快的一种方法是隐式释放:
堆被划分为多个分区,并且虚拟机不时在分区之间切换(例如,当分区太满时)。活动对象被复制到新分区,并且所有死亡对象都不会被释放——它们只是被遗忘了。因此,释放本身最终不会产生任何成本。这种方法的额外好处是堆碎片整理是免费的。
请注意,这是对实际进程的非常笼统的描述。
One approach to make GC faster then explicit deallocation is to deallocate implicitly :
the heap is divided in partitions, and the VM switches between the partitions from time to time (when a partition gets too full for example). Live objects are copied to the new partition and all the dead objects are not deallocated - they are just left forgotten. So the deallocation itself ends up costing nothing. The additional benefit of this approach is that the heap defragmentation is a free bonus.
Please note this is a very general description of the actual processes.
诀窍在于,垃圾收集器的底层分配器可以比显式分配器简单得多,并且可以采取一些显式分配器无法做到的快捷方式。
The trick is, that the underlying allocator for garbage collector can be much simpler than the explicit one and take some shortcuts that the explicit one can't.
尚未提及的一个因素是,当使用手动内存分配时,即使保证对象引用不会形成循环,确定最后一个持有引用的实体何时放弃的成本可能会很高,通常需要使用引用计数器、引用列表,或其他跟踪对象使用情况的方法。这些技术在单处理器系统上并不算太糟糕,其中原子增量的成本可能与普通系统基本相同,但它们在多处理器系统上的扩展性非常差,其中原子增量操作相对昂贵。
A factor not yet mentioned is that when using manual memory allocation, even if object references are guaranteed not to form cycles, determining when the last entity to hold a reference has abandoned it can be expensive, typically requiring the use of reference counters, reference lists, or other means of tracking object usage. Such techniques aren't too bad on single-processor systems, where the cost of an atomic increment may be essentially the same as an ordinary one, but they scale very badly on multi-processor systems, where atomic-increment operations are comparatively expensive.