编写一个自定义 malloc，将信息存储在指针中

发布于 2024-11-29 09:58:47 字数 315 浏览 0 评论 0原文

我最近读到了一系列自动内存管理技术，这些技术依赖于在分配器返回的指针中存储信息，即标头的几位，例如用于区分指针或存储与线程相关的信息（请注意，我不是这里谈论的是有限字段引用计数，只有不可变信息）。

我想尝试一下这些技术。现在，为了实现它们，我需要能够从分配器返回具有特定形状的指针。我想我可以使用最轻的位，但这需要看起来非常消耗内存的填充，所以我相信我应该使用最重的位。但是，我不知道如何做到这一点。有没有办法让我调用 malloc 或 malloc_create_zone 或一些相关函数并请求始终以给定位开头的指针？

谢谢大家！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

茶花眉 2024-12-06 09:58:47

指针中实际可以存储的信息量非常有限（通常每个指针只有一位或两位）。每次尝试取消引用指针都必须首先屏蔽掉魔法信息。顺便说一句，该技术通常称为标记。

 #define TAG_MASK   0x3
 #define CONS_TAG   0x1
 #define STRING_TAG 0x2
 #define NUMBER_TAG 0x3

 typedef uintptr_t value_t; 
 typedef struct cons {
     value_t car;
     value_t cdr;
 } cons_t;

 value_t
 create_cons(value_t t1, value_t t2)
 {
     cons_t* pair = malloc(sizeof(cons_t));
     value_t addr = (value_t)pair;
     pair->car = t1;
     pair->cdr = t2;
     return addr | CONS_TAG;
 }

 value_t
 car_of_cons(value_t v)
 {
     if ((v % TAG_MASK) != CONS_TAG) error("wrong type of argument");
     return ((cons_t*) (v & ~TAG_MASK))->car;
 }

这种技术的一个优点是，您可以直接从指针本身推断出对象的类型。您不需要取消引用它（例如，为了读取特殊的 type 字段或类似字段）。许多使用这种方案的语言实现也有针对“立即数”和其他小值的特殊标记组合，可以直接使用“指针”来表示。

缺点是可以存储的信息量非常有限。此外，如示例代码所示，您必须注意每次访问对象时的标记，并且需要在实际使用指针之前“取消标记”指针。

使用最低有效位来标记源于观察，在大多数平台上，所有指向 malloc 内存的指针实际上都是在非字节边界（通常是 8 字节）上对齐，因此最低有效位有效位始终为零。

The amount of information you can actually store in a pointer is pretty limited (typically one or two bits per pointer). And every attempt to dereference the pointer has to first mask out the magic information. The technique is often called tagging, BTW.

 #define TAG_MASK   0x3
 #define CONS_TAG   0x1
 #define STRING_TAG 0x2
 #define NUMBER_TAG 0x3

 typedef uintptr_t value_t; 
 typedef struct cons {
     value_t car;
     value_t cdr;
 } cons_t;

 value_t
 create_cons(value_t t1, value_t t2)
 {
     cons_t* pair = malloc(sizeof(cons_t));
     value_t addr = (value_t)pair;
     pair->car = t1;
     pair->cdr = t2;
     return addr | CONS_TAG;
 }

 value_t
 car_of_cons(value_t v)
 {
     if ((v % TAG_MASK) != CONS_TAG) error("wrong type of argument");
     return ((cons_t*) (v & ~TAG_MASK))->car;
 }

One advantage of this technique is, that you can directly infer the type of the object from the pointer itself. You don't need to dereference it (say, in order to read a special type field or similar). Many language implementations using this scheme also have a special tag combination for "immediate" numbers and other small values, which can be represented direcly using the "pointer".

The disadvatage is, that the amount of information, which can be stored, is pretty limited. Also, as the example code shows, you have to be aware of the tagging in every access to the object, and need to "untag" the pointer before actually using it.

The use of the least significant bits for tagging stemms from the observation, that on most platforms, all pointer to malloced memory is actually aligned on a non-byte boundary (usually 8 bytes), so the least significant bits are always zero.

回复收藏 0 原文

~没有更多了~