Boehm GC 如何用于 C 程序?

发布于 2024-10-14 12:43:50 字数 114 浏览 9 评论 0原文

我检查了 Boehm GC。 C/C++ 的 GC。

我知道标记和清除算法。我很好奇的是它如何只获取整个 C 内存中的指针。我对C内存的理解只是一个普通的字节数组。是否可以确定内存中的值是否为指针?

I checked Boehm GC. The GC for C/C++.

I know mark-and-sweep algorithm. What I'm in curious is how it picks up only pointers in whole C memory. My understanding about C memory is just a plain byte array. Is it possible to determine a value in memory is pointer or not?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

断念 2024-10-21 12:43:50

Boehm GC 是一个保守的收集器,这意味着它假设所有内容(在指针边界上)都是指针。这意味着它可以找到误报引用,例如恰好具有堆中地址值的整数。因此,某些块在内存中的停留时间可能比使用非保守收集器时要长。

以下是 Boehm 页面 的描述:

垃圾收集器使用修改后的
标记-清除算法。从概念上讲
大致分为四个阶段,其中
偶尔作为一部分执行
内存分配:

  1. 准备工作 每个对象都有一个关联的标记位。清除所有标记
    位,表示所有对象都是
    可能无法访问。
  2. 标记阶段标记所有可以通过链到达的对象
    来自变量的指针。常常是
    收藏家没有真实信息
    关于指针的位置
    堆中的变量,因此它查看所有
    静态数据区、堆栈和
    注册为可能包含
    指针。任何位模式
    表示堆内的地址
    收集器管理的对象是
    被视为指针。除非客户
    程序已经进行了堆对象布局
    可获得的信息
    收集器,发现的任何堆对象
    再次可从变量访问
    类似地进行扫描。
  3. 扫描阶段扫描堆中是否无法访问,因此未标记,
    对象,并将它们返回到
    适当的空闲列表以供重用。这
    并不是真正的一个单独的阶段;甚至
    在非增量模式下这是
    操作通常执行于
    分配期间的需求
    发现一个空的空闲列表。因此
    扫相是不太可能接触到的
    一个本来不会出现的页面
    无论如何,此后不久就触摸了。
  4. 最终阶段 已注册的无法访问的对象
    最终确定已排队
    在收集器之外完成。

您还应该知道,需要为 Boehm GC 提供一组“根”,它们是标记和清除算法的起点。堆栈和寄存器自动成为根。您需要显式添加全局指针作为根。


编辑:在评论中,人们指出了对保守派收藏家的一些担忧。确实,看起来像指向收集器的堆指针的整数可能会导致内存无法释放。这并不像您想象的那么大问题。程序中的大多数标量整数用于计数和大小,并且相当小(因此它们看起来不像堆指针)。您通常会遇到包含位图、字符串、浮点数据或任何此类数据的数组的问题。 Boehm GC 允许您使用 GC_MALLOC_ATOMIC 分配块,这向收集器指示该块将不包含任何指针。如果您查看 gc_typed.h,您还会找到指定哪些部分的方法块可以包含指针。

也就是说,保守收集器的一个基本限制是它无法在收集期间安全地移动内存,因为指针重写不安全。这意味着您将无法获得压缩的任何好处,例如减少碎片和提高缓存性能。

The Boehm GC is a conservative collector, which means it assumes everything (on pointer boundaries) is a pointer. This means that it can find false positive references, like an integer which coincidentally has the value of an address in the heap. As a result, some blocks may stay in memory longer than they would with a non-conservative collector.

Here's a description from Boehm's page:

The garbage collector uses a modified
mark-sweep algorithm. Conceptually it
operates roughly in four phases, which
are performed occasionally as part of
a memory allocation:

  1. Preparation Each object has an associated mark bit. Clear all mark
    bits, indicating that all objects are
    potentially unreachable.
  2. Mark phase Marks all objects that can be reachable via chains of
    pointers from variables. Often the
    collector has no real information
    about the location of pointer
    variables in the heap, so it views all
    static data areas, stacks and
    registers as potentially containing
    pointers. Any bit patterns that
    represent addresses inside heap
    objects managed by the collector are
    viewed as pointers. Unless the client
    program has made heap object layout
    information available to the
    collector, any heap objects found to
    be reachable from variables are again
    scanned similarly.
  3. Sweep phase Scans the heap for inaccessible, and hence unmarked,
    objects, and returns them to an
    appropriate free list for reuse. This
    is not really a separate phase; even
    in non incremental mode this is
    operation is usually performed on
    demand during an allocation that
    discovers an empty free list. Thus the
    sweep phase is very unlikely to touch
    a page that would not have been
    touched shortly thereafter anyway.
  4. Finalization phase Unreachable objects which had been registered for
    finalization are enqueued for
    finalization outside the collector.

You should also know that the Boehm GC needs to be given a set of "roots", which are starting points for the mark-and-sweep algorithm. The stack and registers are automatically roots. You need to explicitly add global pointers as roots.


EDIT: In comments, some concerns were pointed out about conservative collectors in general. It is true that integers that look like heap pointers to the collector can cause memory not to be released. This is not as big of a problem as you might think. Most scalar integers in a program are used for counts and sizes and are fairly small (so they would not look like heap pointers). You would mostly run into problems with arrays containing bitmaps, strings, floating point data, or anything of that sort. Boehm GC lets you allocate a block with GC_MALLOC_ATOMIC which indicates to the collector that the block will not contain any pointers. If you look in gc_typed.h, you will also find ways to specify what parts of a block may contain pointers.

That said, a fundamental limitation of a conservative collector is that it cannot safely move memory around during collection since pointer rewriting is not safe. This means you won't get any of the benefits of compaction like lowered fragmentation and improved cache performance.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文