对齐内存管理?
我有一些关于管理对齐内存块的相关问题。跨平台的答案将是理想的。然而,由于我非常确定不存在跨平台解决方案,因此我主要对 Windows 和 Linux 感兴趣,并且在较小程度上对 Mac OS 和 FreeBSD 感兴趣。
使内存块在 16 字节边界上对齐的最佳方法是什么? (我知道使用
malloc()
的简单方法,分配一点额外的空间,然后将指针提升到正确对齐的值。我希望有一些不那么混乱的东西- y,不过,请参阅下面的其他问题。)如果我使用普通的旧
malloc()
,分配额外的空间,然后将指针向上移动到正确对齐的位置,是否有必要保留指向块开头的指针以进行释放? (在指向块中间的指针上调用free()
似乎在 Windows 上实际上可行,但我想知道标准是怎么说的,即使标准说不能,是否它实际上适用于所有主要操作系统,我不关心类似 DS9K 的晦涩难懂的操作系统。)这是困难/有趣的部分。在保持对齐的同时重新分配内存块的最佳方法是什么?理想情况下,这比调用
malloc()
、复制然后在旧块上调用free()
更智能。我希望尽可能在适当的地方进行操作。
I have a few related questions about managing aligned memory blocks. Cross-platform answers would be ideal. However, as I'm pretty sure a cross-platform solution does not exist, I'm mainly interested in Windows and Linux and to a (much) lesser extent Mac OS and FreeBSD.
What's the best way of getting a chunk of memory aligned on 16-byte boundaries? (I'm aware of the trivial method of using
malloc()
, allocating a little extra space and then bumping the pointer up to a properly aligned value. I'm hoping for something a little less kludge-y, though. Also, see below for additional issues.)If I use plain old
malloc()
, allocate extra space, and then move the pointer up to where it would be correctly aligned, is it necessary to keep the pointer to the beginning of the block around for freeing? (Callingfree()
on pointers to the middle of the block seems to work in practice on Windows, but I'm wondering what the standard says and, even if the standard says you can't, whether it works in practice on all major OS's. I don't care about obscure DS9K-like OS's.)This is the hard/interesting part. What's the best way to reallocate a memory block while preserving alignment? Ideally this would be something more intelligent than calling
malloc()
, copying, and then callingfree()
on the old block. I'd like to do it in place where possible.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
如果您的实现具有需要 16 字节对齐的标准数据类型(例如
long long
),malloc
已经保证您返回的块将正确对齐。 C99 第 7.20.3 节规定分配成功时返回的指针经过适当对齐,以便可以将其分配给指向任何类型对象的指针。
您必须 将与
malloc
给定的地址完全相同的地址传回free
。没有例外。所以是的,您需要保留原始副本。如果您已经有需要 16 字节对齐的类型,请参阅上面的 (1)。
除此之外,您可能会发现您的
malloc
实现无论如何都会为您提供 16 字节对齐的地址以提高效率,尽管标准并不能保证这一点。如果您需要,您始终可以实现您自己的分配器。我自己会在
malloc
之上实现一个malloc16
层,该层将使用以下结构:然后让您的
malloc16()
函数调用 < code>malloc 获取比请求大 16 字节的块,找出对齐区域应该在哪里,将填充长度放在该区域之前,然后返回对齐区域的地址。对于 free16,您只需查看给定地址之前的字节即可获取填充长度,从中计算出 malloc 块的实际地址,然后将其传递给 free /代码>。
这尚未经过测试,但应该是一个好的开始:
malloc16
中的魔法线是p = (porig + 16) & (~0xf);
将 16 添加到地址,然后将低 4 位设置为 0,实际上将其带回到下一个最低对齐点(+16
保证它是过去的maloc'ed 块的实际开始)。现在,我并不是说上面的代码有什么问题,而是杂凑的。您必须在感兴趣的平台上对其进行测试,看看它是否可行。它的主要优点是它抽象了丑陋的部分,因此您永远不必担心它。
If your implementation has a standard data type that needs 16-byte alignment (
long long
for example),malloc
already guarantees that your returned blocks will be aligned correctly. Section 7.20.3 of C99 statesThe pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object.
You have to pass back the exact same address into
free
as you were given bymalloc
. No exceptions. So yes, you need to keep the original copy.See (1) above if you already have a 16-byte-alignment-required type.
Beyond that, you may well find that your
malloc
implementation gives you 16-byte-aligned addresses anyway for efficiency although it's not guaranteed by the standard. If you require it, you can always implement your own allocator.Myself, I'd implement a
malloc16
layer on top ofmalloc
that would use the following structure:Then have your
malloc16()
function callmalloc
to get a block 16 bytes larger than requested, figure out where the aligned area should be, put the padding length just before that and return the address of the aligned area.For
free16
, you would simply look at the byte before the address given to get the padding length, work out the actual address of the malloc'ed block from that, and pass that tofree
.This is untested but should be a good start:
The magic line in the
malloc16
isp = (porig + 16) & (~0xf);
which adds 16 to the address then sets the lower 4 bits to 0, in effect bringing it back to the next lowest alignment point (the+16
guarantees it is past the actual start of the maloc'ed block).Now, I don't claim that the code above is anything but kludgey. You would have to test it in the platforms of interest to see if it's workable. Its main advantage is that it abstracts away the ugly bit so that you never have to worry about it.
启动 C11,您有
void *aligned_alloc( size_t adjustment, size_t size );
原语,其中参数为:alignment - 指定对齐方式。必须是实现支持的有效对齐方式。
大小 - 要分配的字节数。对齐的整数倍
返回值
成功时,返回指向新分配的内存开头的指针。返回的指针必须用 free() 或 realloc() 释放。
失败时,返回空指针。
示例:
可能的输出:
Starting a C11, you have
void *aligned_alloc( size_t alignment, size_t size );
primitives, where the parameters are:alignment - specifies the alignment. Must be a valid alignment supported by the implementation.
size - number of bytes to allocate. An integral multiple of alignment
Return value
On success, returns the pointer to the beginning of newly allocated memory. The returned pointer must be deallocated with free() or realloc().
On failure, returns a null pointer.
Example:
Possible output:
我不知道有什么方法可以比平常更严格地请求 malloc 返回内存。至于 Linux 上的“通常”,来自 man posix_memalign (如果您愿意,您可以使用它代替 malloc() 来获得更严格对齐的内存):
GNU libc malloc() 始终返回 8 字节对齐的内存地址,因此
仅当您需要更大的对齐值时才需要这些例程。
您必须使用malloc()、posix_memalign()或realloc()返回的相同指针来free()内存。
照常使用 realloc(),包括足够的额外空间,因此如果返回的新地址尚未对齐,您可以稍微 memmove() 来对齐它。讨厌,但我能想到的最好的。
I'm not aware of any way of requesting malloc return memory with stricter alignment than usual. As for "usual" on Linux, from man posix_memalign (which you can use instead of malloc() to get more strictly aligned memory if you like):
GNU libc malloc() always returns 8-byte aligned memory addresses, so
these routines are only needed if you require larger alignment values.
You must free() memory using the same pointer returned by malloc(), posix_memalign() or realloc().
Use realloc() as usual, including sufficient extra space so if a new address is returned that isn't already aligned you can memmove() it slightly to align it. Nasty, but best I can think of.
您可以编写自己的 slab 分配器 来处理您的对象,它可以使用 < 一次分配页面code>mmap,维护最近释放的地址的缓存以进行快速分配,为您处理所有对齐,并为您提供完全根据需要移动/增长对象的灵活性。 malloc 对于通用分配来说非常好,但是如果您知道您的数据布局和分配需求,您可以设计一个系统来准确满足这些要求。
You could write your own slab allocator to handle your objects, it could allocate pages at a time using
mmap
, maintain a cache of recently-freed addresses for fast allocations, handle all your alignment for you, and give you the flexibility to move/grow objects exactly as you need.malloc
is quite good for general-purpose allocations, but if you know your data layout and allocation needs, you can design a system to hit those requirements exactly.最棘手的要求显然是第三个,因为任何基于
malloc()
/realloc()
的解决方案都会受制于realloc()
移动块到不同的对齐方式。在 Linux 上,您可以使用通过
mmap()
而不是malloc()
创建的匿名映射。mmap()
返回的地址必然是页面对齐的,并且可以使用mremap()
扩展映射。The trickiest requirement is obviously the third one, since any
malloc()
/realloc()
based solution is hostage torealloc()
moving the block to a different alignment.On Linux, you could use anonymous mappings created with
mmap()
instead ofmalloc()
. Addresses returned bymmap()
are by necessity page-aligned, and the mapping can be extended withmremap()
.在您的系统上进行实验。在许多系统(尤其是 64 位系统)上,无论如何,您都可以从
malloc()
中获得 16 字节对齐的内存。如果没有,您将必须分配额外的空间并移动指针(几乎每台机器上最多移动 8 个字节)。例如,x86/64 上的 64 位 Linux 具有 16 字节长双精度型,它是 16 字节对齐的 - 因此所有内存分配无论如何都是 16 字节对齐的。但是,对于 32 位程序,
sizeof(long double)
为 8,并且内存分配仅按 8 字节对齐。是的 - 您只能
free()
由malloc()
返回的指针。其他任何事情都会导致灾难。如果您的系统执行 16 字节对齐分配,则不会有问题。如果没有,那么您将需要自己的重新分配器,它执行 16 字节对齐分配,然后复制数据 - 或者使用系统
realloc()
并在以下情况下调整重新对齐的数据:必要的。仔细检查
malloc()
的手册页;可能有一些选项和机制可以对其进行调整,使其按照您的意愿运行。在 MacOS X 上,有 posix_memalign() 和 valloc()(提供页对齐分配),并且识别出一系列“分区 malloc”函数由
man malloc_zoned_malloc
编写,标头为
。Experiment on your system. On many systems (especially 64-bit ones), you get 16-byte aligned memory out of
malloc()
anyway. If not, you will have to allocate the extra space and move the pointer (by at most 8 bytes on almost every machine).For example, 64-bit Linux on x86/64 has a 16-byte
long double
, which is 16-byte aligned - so all memory allocations are 16-byte aligned anyway. However, with a 32-bit program,sizeof(long double)
is 8 and memory allocations are only 8-byte aligned.Yes - you can only
free()
the pointer returned bymalloc()
. Anything else is a recipe for disaster.If your system does 16-byte aligned allocations, there isn't a problem. If it doesn't, then you'll need your own reallocator, which does a 16-byte aligned allocation and then copies the data - or that uses the system
realloc()
and adjusts the realigned data when necessary.Double check the manual page for your
malloc()
; there may be options and mechanisms to tweak it so it behaves as you want.On MacOS X, there is
posix_memalign()
andvalloc()
(which gives a page-aligned allocation), and there is a whole series of 'zoned malloc' functions identified byman malloc_zoned_malloc
and the header is<malloc/malloc.h>
.您可能能够吉米(在Microsoft VC++中,也许还有其他编译器):
#pragma pack(16)
这样malloc()被迫返回一个16字节-对齐的指针。大致如下:
ptr_16byte = malloc( 10 * sizeof( my_16byte_aligned_struct ));
如果它对 malloc( ) 完全有效,我认为它也适用于 realloc( ) 。
只是一个想法。
——皮特
You might be able to jimmy (in Microsoft VC++ and maybe other compilers):
#pragma pack(16)
such that malloc( ) is forced to return a 16-byte-aligned pointer. Something along the lines of:
ptr_16byte = malloc( 10 * sizeof( my_16byte_aligned_struct ));
If it worked at all for malloc( ), I'd think it would work for realloc( ) just as well.
Just a thought.
-- pete