(C) 堆分配器如何处理 4 字节块头，同时仅返回 8 倍数的地址？

发布于 2024-09-04 01:55:02 字数 686 浏览 8 评论 0原文

这似乎没有意义，除非我们忽略段开头任何潜在的多余空间，然后让第一个分配的块位于 8 的第一个倍数处（其对应的第一个标头是该地址 -4）。这会留下之前的许多字节未使用。一般都是这样做的吗？

编辑： 感谢 paxdiablo 下面的详细解释。这对于 16 字节标头来说都是有意义的。但是，我正在使用 4 字节标头，它看起来像这样：

struct mhdr {
    int size;  // size of this block
} tMallocHdr;

现在，如果我的堆从 8 的倍数的地址开始，并且 malloc 返回的任何地址都需要是 8 的倍数，并且我需要使用 4 字节标头，我似乎被迫“浪费”堆的前 4 个字节。例如：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
              ^
              (heap starts)

如果堆从上面的胡萝卜地址 8 开始，使用本例中的寻址方案，在 malloc 调用之后我可以返回给用户的第一个可返回地址将是 16；我需要 4 个字节的标头，并且允许 4 个字节标头的 8 的倍数的第一个地址是 16（标头从 12 开始）。这意味着我浪费了内部堆内存的前 4 个字节来排列内容 (8-11)。

这是一个可以接受的牺牲，还是我的想法是错误的？

原文

It doesn't seem to make sense, unless we just ignore any potential excess space at the beginning of a segment, and then have the first allocated chunk be at the first multiple of 8 (with its corresponding first header being that address -4). This would leave however many bytes before that unused. Is that what's generally done?

edit:
thanks to paxdiablo for the detailed explanation below. that all makes sense for 16 byte headers. however, i'm working with a 4 byte header, that looks something like this:

struct mhdr {
    int size;  // size of this block
} tMallocHdr;

now, if my heap starts on an address that is a multiple of 8, and any address returned by malloc needs to be a multiple of 8, and i need to use 4 byte headers, I seem to be forced to 'waste' the first 4 bytes of my heap. for example:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
              ^
              (heap starts)

If the heap starts at the carrot above at address 8, using the addressing scheme in this example, the first returnable address i could return to a user following a malloc call would be 16; i need 4 bytes of header, and the first address that's a multiple of 8 which allows 4 bytes of header is 16 (header starts at 12). This means I've wasted the first 4 bytes of my internal heap memory to line things up (8-11).

Is this an acceptable sacrifice, or am I thinking about this wrong?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

甜柠檬 2024-09-11 01:55:02

一般来说，分配区域的块头中存在浪费的空间。我见过的许多实现都使用 16 字节（我在这里使用字节的经典定义，八位之一，而不是 ISO C 定义）标头，紧接在从 malloc，并将分配的区域填充到末尾的 16 个字节。

这大大简化了用于分配的算法，并且还保证返回的内存地址将针对架构进行适当的对齐。

但请记住，这是一个实现细节，而不是 C 标准的功能。如果没有对齐要求并且内存分配限制为 255 字节，那么很可能只会浪费一个字节（尽管以更复杂的算法为代价）。

您可能拥有一个仅分配 256 字节块的嵌入式应用程序，在这种情况下，您可能拥有一个简单的基于位图的分配器（位图存储在其他地方而不是与正在分配的内存块内嵌），这很可能会浪费每个块只有一位（我们之前在低内存环境中已经这样做过）。

或者，也许您在大地址空间和内存环境中拥有一个分配器，无论您需要什么，它都能为您提供 4G。然后，标题就没有浪费，但填充可能足够了:-)

或者，您可能从特定大小的竞技场获得一个块（来自竞技场 A 的 1-64 字节，来自竞技场的 65-128 字节） B 等）。这意味着不需要标头，但仍然允许可变大小（最大），并且比上述 4G 解决方案大大减少了浪费。

底线是，这取决于实施。

在 malloc 的（相当简单的）实现中，您可以拥有一个由分配器给出的“块”的双向链接列表。这些块由标头和数据部分组成，为了确保数据部分的对齐正确，标头必须位于 16 字节边界上，并且长度是 16 字节的倍数（这是为了满足 16 字节对齐要求） - 您的实际要求可能没有那么严格）。

此外，块本身被填充，因此它是 16 字节的倍数，以便下一个块也适当对齐。

这保证了任何块的数据部分（即提供给 malloc 调用者的地址）正确对齐。

因此，您很可能会在该区域遭受浪费。标头（具有 4 字节整数和指针）可能只是：

struct mhdr {
    int size;          // size of this block.
    struct mhdr *prev; // prev chunk.
    struct mhdr *next; // next chunk.
    int junk;          // for padding.
} tMallocHdr;

意味着这 16 个字节中的 4 个将被浪费。有时，这对于满足其他要求（对齐）是必要的，并且在任何情况下，您都可以将该填充空间用于其他目的。正如一位评论者指出的那样，保护字节可用于检测某些形式的竞技场损坏：

struct mhdr {
    int guard_bytes;   // set to 0xdeadbeef to detect some corruptions.
    int size;          // size of this block.
    struct mhdr *prev; // prev chunk.
    struct mhdr *next; // next chunk.
} tMallocHdr;

而且，虽然这种填充在技术上是浪费空间，但只有当它占整个竞技场的相当大的比例时，它才变得重要。如果您分配 4K 内存块，则这四个字节的浪费仅为总大小的千分之一左右。实际上，作为用户，您的浪费可能是标头的整个 16 个字节，因为这是您无法使用的内存，因此大约为 0.39% (16 / (4096 + 16))。

这就是为什么 malloc 的字符链接列表是一个非常糟糕的主意 - 您倾向于对每个字符使用：

16 字节的标头。
字符占 1 个字节（假设为 8 位字符）。
15 字节的填充。

这会将 0.39% 的数字变为 96.9% 的浪费 (31 / (31 + 1))。

并且，回答您的进一步问题：

这是一个可以接受的牺牲，还是我的想法是错误的？

我会说，是的，这是可以接受的。通常，您分配较大的内存块，其中四个字节不会对总体方案产生影响。正如我之前所说，如果您分配大量小东西，那么这不是一个好的解决方案，但这不是 malloc 的一般用例。

Generally, there is wasted space in a block header of an allocated region. Many implementations I've seen use a 16-byte (I'm using the classical definition of byte here, one of eight bits, rather than the ISO C definition) header, immediately before the address returned from malloc, and pad out the allocated region to 16 bytes at the end as well.

This greatly simplifies the algorithms used for allocation and also guarantees that memory addresses returned will be aligned appropriately for the architecture.

But keep in mind that this is an implementation detail, not a feature of the C standard. If there are no alignment requirements and memory allocations are limited to 255 bytes, it's quite plausible that only one byte will be wasted (albeit at the expense of a more complicated algorithm).

It's quite plausible that you could have an embedded application that only ever allocates 256-byte chunks in which case you could have a simple bitmap-based allocator (with the bitmap stored elsewhere rather than in-line with the memory blocks being allocated), wasting only one bit per chunk (we've done this before in low memory environments).

Or maybe you have an allocator in a large address-space and memory environment that gives you 4G no matter what you ask for. Then there's no wastage for a header, but probably plenty for the padding :-)

Or maybe you get a chunk from a size-specific arena (1-64 bytes from arena A, 65-128 from arena B and so on). This means no header required but still allowing variable size (up to a maximum) and substantially less wastage than the 4G solution above.

Bottom line, it depends on the implementation.

In a (reasonably simple) implementation of malloc, you could have a doubly-linked list of "chunks" which are given out by the allocator. These chunks consist of a header and a data section and, to ensure alignment of the data section is correct, the header must be on a 16-byte boundary and be a multiple of 16 bytes in length (that's for a 16-byte alignment requirement - your actual requirement may not be that stringent).

In addition, the chunk itself is padded so it's a multiple of 16 bytes so that the next chunk is suitably aligned as well.

That guarantees that the data section of any chunk, which is the address given to the caller of malloc, is aligned correctly.

So you may well get wastage in that area. A header (with 4-byte integers and pointers) may simply be:

struct mhdr {
    int size;          // size of this block.
    struct mhdr *prev; // prev chunk.
    struct mhdr *next; // next chunk.
    int junk;          // for padding.
} tMallocHdr;

meaning that four of those sixteen bytes will be wasted. Sometimes that's necessary to meet the other requirements (alignment) and, in any case, you can use that padding space for other purposes. As one commenter pointed out, guard bytes can be used to detect some forms of arena corruption:

struct mhdr {
    int guard_bytes;   // set to 0xdeadbeef to detect some corruptions.
    int size;          // size of this block.
    struct mhdr *prev; // prev chunk.
    struct mhdr *next; // next chunk.
} tMallocHdr;

And, while that padding is technically wasted space, it only becomes important if it's a sizeable proportion of the entire arena. If you're allocating 4K blocks of memory, the four bytes of wastage is only about a thousandth of the total size. Actually, the wastage to you as a user is probably the entire 16 bytes of the header since that's memory you can't use, so it's about 0.39% (16 / (4096 + 16)).

That's why a malloc'ed linked list of characters is a very bad idea - you tend to use, for every single character:

16 bytes of header.
1 byte for the character (assuming 8-bit characters).
15 bytes of padding.

This would change your 0.39% figure into 96.9% wastage (31 / (31 + 1)).

And, in response to your further question:

Is this an acceptable sacrifice, or am I thinking about this wrong?

I would say, yes, this is acceptable. Generally, you allocate larger chunks of memory where four bytes won't make a difference in the grand scheme of things. As I stated earlier, it's not a good solution if you allocate lots of little things but that's not the general use case of malloc.

回复收藏 0 原文

冷︶言冷语的世界 2024-09-11 01:55:02

某些分配器没有任何已分配块的标头。有些消除了小块的标头。除了安全性之外，分配的块根本不需要标头。这可以通过其他方式来实现，例如，通过在物理上与所分配的存储器完全分离的管理数据。

即使类型化的内存分配器也可以消除标头。例如，请参阅 BIBOP。

回复收藏 0 原文

没有伤那来痛 2024-09-11 01:55:02

您可能会发现，如果返回的块始终在 8 字节边界上对齐，则实现将使用 8 字节块来保存任何控制信息。我相信 K&R（第一版和第二版）中的实现可以做到这一点。事实上，它使用了一个指针和一个大小，因此它有 8 个字节的控制信息。如果它使用 8 字节对齐的块，那么这样做并没有真正的惩罚。（对于 64 位机器，最严格的对齐可能是 16 字节 - 例如，它在 SPARC 上 - 因此块开销会更大一些，但原因基本相同。）

回复收藏 0 原文

~没有更多了~