如何在Mac OSX上使用虚拟内存/实现Realloc?

发布于 2025-02-07 14:34:43 字数 126 浏览 3 评论 0 原文

我正在Mac上玩汇编。在Linux上,我使用MMAP/MREMAP/MUNMAP实现了REALLOC,但Mac上似乎没有MREMAP。如何在汇编中使用虚拟内存实现Realloc?我需要什么系统调用?我是针对M1,但X86-64解决方案很好

I'm playing with assembly on mac. On linux I implemented realloc by using mmap/mremap/munmap but there doesn't seem to be a mremap on mac. How would I implement realloc using virtual memory in assembly? What system call(s) would I need? I'm targeting M1 but x86-64 solutions are fine

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

梦里兽 2025-02-14 14:34:43

MacOS/Mach上的低级方式是(MACH_)VM_ALLOCACE (mach_)vm_deallocate 函数。您还可以使用(MACH_)VM_MAP ,而不是第一个,它允许设置内存保护和继承行为(如何处理子进程中的区域,AFAIK)。

(另请参见 mach_vm_allocate和vm_allocate和vm_allocate之间有什么区别? /a>)

重新分配仅通过在第一个之后精确分配另一个区域来完成。如果条件允许的话,内核试图将其合并(合并)。请参阅实现 vm_map_enter (也由 vm_allocate 使用),向下滚动到注释“查看我们是否可以避免创建新条目……”逻辑开始的位置。

因此,首先,您要求内核分配 ewhere ,然后要求它在已经分配的区域之后分配。

vm_make_tag(vm_memory_malloc_large)之类的标签在这里扮演角色,据我了解,您分配了以前的一个区域看来这不是严格必要的。

您还可以在 vm_copy 复制旧内容,否则完成了简单的 memcpy


分配需要分页符。因此,在运行时,您需要知道内核想要使用的页面大小。很容易通过 vm_page_size acro/ vm_page_size variable或其他几种方式。但是,当涉及组装(没有任何共享库)时,您如何获得此信息?通过COMM页面,将内核映射到每个过程中的区域noreferrer“>在同一位置,它在不需要进行内核呼叫的情况下提供了一些信息。


因此,让我们首先在C中这样做。以下源应为每个操作输出(res = 0)(aka kern_success )。

#include <stdio.h>
#include <mach/mach_vm.h>
#include <mach/mach_init.h>
#include <mach/task_info.h>
#include <unistd.h>

int main(int argc, const char * argv[]) {
#if defined(__arm64__)
    // From xnu/osfmk/arm/cpu_abilities.h
#define _COMM_PAGE64_RO_ADDRESS       (0x0000000FFFFF4000ULL)
#define _COMM_PAGE_USER_PAGE_SHIFT_64 (_COMM_PAGE64_RO_ADDRESS+0x025)
    uint8_t commPageVMPageShift = *(uint8_t const * const)_COMM_PAGE_USER_PAGE_SHIFT_64;
    printf("Page shift from COMM_PAGE: %u, page size: %u\n", commPageVMPageShift, 1 << commPageVMPageShift);
#endif
    
    mach_vm_address_t address = 0;
    mach_vm_size_t size = VM_PAGE_SIZE;
//    kern_return_t res = mach_vm_allocate(mach_task_self(), &address, size, VM_FLAGS_ANYWHERE | VM_MAKE_TAG(VM_MEMORY_MALLOC_LARGE));
    kern_return_t res = mach_vm_map(mach_task_self(), &address, size, 0, VM_FLAGS_ANYWHERE | VM_MAKE_TAG(VM_MEMORY_MALLOC_LARGE), MEMORY_OBJECT_NULL, 0, FALSE, VM_PROT_DEFAULT, VM_PROT_ALL, VM_INHERIT_DEFAULT);
    printf("Allocated %llu bytes at %p (res = %d)\n", size, (void *)address, res);
    memset((void *)address, 0x42, size);
    
    mach_vm_size_t newSize = size + VM_PAGE_SIZE;
    mach_vm_address_t nextAddress = address + size;
    res = mach_vm_allocate(mach_task_self(), &nextAddress, newSize - size, VM_MAKE_TAG(VM_MEMORY_REALLOC));
    printf("Allocated additional %llu bytes at %p (res = %d)\n", newSize - size, (void *)nextAddress, res);
    
    res = mach_vm_deallocate(mach_task_self(), address, newSize);
    printf("Deallocated everything (res = %d)\n", res);

    return 0;
}

是否使用 MACH_VM_ALLOCACE VM_MAP 此处都无关紧要。


现在,为了娱乐,让我们在ARM64组装中进行此操作,而无需使用任何库:

// From xnu/osfmk/arm/cpu_abilities.h
_COMM_PAGE64_RO_ADDRESS = 0x0000000FFFFF4000
_COMM_PAGE_USER_PAGE_SHIFT_64 = (_COMM_PAGE64_RO_ADDRESS + 0x025)

// Kernel: "mach_task_self"
// Source: osfmk/mach/syscall_sw.h
.equ SYSCALL_MACH_TASK_SELF, -28

// Kernel: "_kernelrpc_mach_vm_allocate_trap"
// Source: osfmk/mach/syscall_sw.h
.equ SYSCALL_MACH_VM_ALLOCATE, -10

// Kernel: "_kernelrpc_mach_port_deallocate_trap"
// Source: osfmk/mach/syscall_sw.h
.equ SYSCALL_MACH_VM_DEALLOCATE, -12

// Define some helper registers (could also use the stack or variables in heap)
REG_PAGE_SIZE .req X20
REG_MACH_TASK_SELF .req X21
REG_ADDRESS .req X22

.global _main
_main:
    // Set up a stack frame
    stp FP, LR, [SP, #-16]!
    mov FP, SP

    // Read the page size shift used by kernel from the comm page.
    mov X0, #(_COMM_PAGE_USER_PAGE_SHIFT_64 & 0xFFFF)
    movk X0, #((_COMM_PAGE_USER_PAGE_SHIFT_64 >> 16) & 0xFFFF), LSL #16
    movk X0, #((_COMM_PAGE_USER_PAGE_SHIFT_64 >> 32) & 0xFFFF), LSL #32
    movk X0, #((_COMM_PAGE_USER_PAGE_SHIFT_64 >> 48) & 0xFFFF), LSL #48
    ldrb W1, [X0]
    // Calculate the page size using the shift value.
    mov W2, #1
    lslv W3, W2, W1
    // Remember the page size.
    mov REG_PAGE_SIZE, X3

    // Get mach_task_self
    mov X16, SYSCALL_MACH_TASK_SELF
    svc 80
    // Save mach_task_self for later use.
    mov REG_MACH_TASK_SELF, X0

    // Allocate one page.
    str XZR, [SP, #-16]! // Push null pointer on stack
    mov X0, REG_MACH_TASK_SELF // Arg 1: target
    mov X1, SP // Arg 2: pointer to address (in/out)
    mov X2, REG_PAGE_SIZE // Arg 3: size to allocate
    mov X3, #1 // Arg 4: VM_FLAGS_ANYWHERE | VM_MAKE_TAG(VM_MEMORY_MALLOC_LARGE)
    movk X3, #0x0300, LSL #16 // value of the flags: 0x3000001
    mov X16, SYSCALL_MACH_VM_ALLOCATE
    svc 80
    cbnz X0, L_exit // Exit on failure.

    // Now we have the page address on the stack.
    ldr REG_ADDRESS, [SP]
    // Calculate adjacent page address
    add X0, REG_ADDRESS, REG_PAGE_SIZE
    // Store on the stack again for next call to `vm_allocate`
    str X0, [SP]

    // Allocate adjacent page.
    mov X0, REG_MACH_TASK_SELF // Arg 1: target
    mov X1, SP // Arg 2: pointer to address (in/out)
    mov X2, REG_PAGE_SIZE // Arg 3: size to allocate
    mov X3, #1 // Arg 4: VM_FLAGS_ANYWHERE | VM_MAKE_TAG(VM_MEMORY_REALLOC)
    movk X3, #0x0600, LSL #16 // value of the flags: 0x6000001
    mov X16, SYSCALL_MACH_VM_ALLOCATE
    svc 80
    cbnz X0, L_exit // Exit on failure.

    // Now we have two pages! On the stack we have the address of the second page.
    mov X0, REG_MACH_TASK_SELF // Arg 1: target
    mov X1, REG_ADDRESS // Arg 2: address (not a pointer)
    add X2, REG_PAGE_SIZE, REG_PAGE_SIZE // Arg 3: 2 * page_size
    mov X16, SYSCALL_MACH_VM_DEALLOCATE
    svc 80

L_exit:
    ret

使用作为-o example.o example.s组装和链接; ld -o示例示例。运行二进制文件,如果一切进展顺利,则应使用返回代码0退出。

The low-level way on macOS/Mach are the (mach_)vm_allocate and (mach_)vm_deallocate functions. Instead of the first, you can also use (mach_)vm_map which allows to set the memory protection and inheritance behaviour (how to handle the area in child processes, AFAIK).

(See also What's the difference between mach_vm_allocate and vm_allocate?)

Reallocation is simply done by allocating another area exactly after the first. The kernel tries to merge (coalesce) them, if the conditions allow it. See the implementation of vm_map_enter (also used by vm_allocate), scroll down to the comment "See whether we can avoid creating a new entry …" where the logic starts.

So first you ask the kernel to allocate anywhere, then you ask it to allocate exactly after the region(s) you've already allocated.

The tags like VM_MAKE_TAG(VM_MEMORY_MALLOC_LARGE) play a role here, and as far as I understand the region you allocate to grow a previous one should use VM_MAKE_TAG(VM_MEMORY_REALLOC), but it looks like that's not strictly necessary.

You can also see this being done in magazine_large.c in Apple's libmalloc (search for VM_MEMORY_REALLOC). If this fails, a new area of the wanted size is allocated and the old content is copied using vm_copy if possible, otherwise a simple memcpy is done.


Allocation needs to be page-aligned. So at runtime, you need to know the page size the kernel wants to use. It's easy to get in C via the VM_PAGE_SIZE macro/vm_page_size variable, or several other ways. But when it comes to assembly (without any shared libraries), how do you get this information? Via the COMM PAGE, an area the kernel maps into every process at the same location which gives the processes some informations without the need for doing kernel calls.


So let's do this in C first. The following source should output (res = 0) (aka KERN_SUCCESS) for every operation.

#include <stdio.h>
#include <mach/mach_vm.h>
#include <mach/mach_init.h>
#include <mach/task_info.h>
#include <unistd.h>

int main(int argc, const char * argv[]) {
#if defined(__arm64__)
    // From xnu/osfmk/arm/cpu_abilities.h
#define _COMM_PAGE64_RO_ADDRESS       (0x0000000FFFFF4000ULL)
#define _COMM_PAGE_USER_PAGE_SHIFT_64 (_COMM_PAGE64_RO_ADDRESS+0x025)
    uint8_t commPageVMPageShift = *(uint8_t const * const)_COMM_PAGE_USER_PAGE_SHIFT_64;
    printf("Page shift from COMM_PAGE: %u, page size: %u\n", commPageVMPageShift, 1 << commPageVMPageShift);
#endif
    
    mach_vm_address_t address = 0;
    mach_vm_size_t size = VM_PAGE_SIZE;
//    kern_return_t res = mach_vm_allocate(mach_task_self(), &address, size, VM_FLAGS_ANYWHERE | VM_MAKE_TAG(VM_MEMORY_MALLOC_LARGE));
    kern_return_t res = mach_vm_map(mach_task_self(), &address, size, 0, VM_FLAGS_ANYWHERE | VM_MAKE_TAG(VM_MEMORY_MALLOC_LARGE), MEMORY_OBJECT_NULL, 0, FALSE, VM_PROT_DEFAULT, VM_PROT_ALL, VM_INHERIT_DEFAULT);
    printf("Allocated %llu bytes at %p (res = %d)\n", size, (void *)address, res);
    memset((void *)address, 0x42, size);
    
    mach_vm_size_t newSize = size + VM_PAGE_SIZE;
    mach_vm_address_t nextAddress = address + size;
    res = mach_vm_allocate(mach_task_self(), &nextAddress, newSize - size, VM_MAKE_TAG(VM_MEMORY_REALLOC));
    printf("Allocated additional %llu bytes at %p (res = %d)\n", newSize - size, (void *)nextAddress, res);
    
    res = mach_vm_deallocate(mach_task_self(), address, newSize);
    printf("Deallocated everything (res = %d)\n", res);

    return 0;
}

Whether you use mach_vm_allocate or vm_map here doesn't matter.


And now for fun, let's do it in ARM64 assembly without the use of any library:

// From xnu/osfmk/arm/cpu_abilities.h
_COMM_PAGE64_RO_ADDRESS = 0x0000000FFFFF4000
_COMM_PAGE_USER_PAGE_SHIFT_64 = (_COMM_PAGE64_RO_ADDRESS + 0x025)

// Kernel: "mach_task_self"
// Source: osfmk/mach/syscall_sw.h
.equ SYSCALL_MACH_TASK_SELF, -28

// Kernel: "_kernelrpc_mach_vm_allocate_trap"
// Source: osfmk/mach/syscall_sw.h
.equ SYSCALL_MACH_VM_ALLOCATE, -10

// Kernel: "_kernelrpc_mach_port_deallocate_trap"
// Source: osfmk/mach/syscall_sw.h
.equ SYSCALL_MACH_VM_DEALLOCATE, -12

// Define some helper registers (could also use the stack or variables in heap)
REG_PAGE_SIZE .req X20
REG_MACH_TASK_SELF .req X21
REG_ADDRESS .req X22

.global _main
_main:
    // Set up a stack frame
    stp FP, LR, [SP, #-16]!
    mov FP, SP

    // Read the page size shift used by kernel from the comm page.
    mov X0, #(_COMM_PAGE_USER_PAGE_SHIFT_64 & 0xFFFF)
    movk X0, #((_COMM_PAGE_USER_PAGE_SHIFT_64 >> 16) & 0xFFFF), LSL #16
    movk X0, #((_COMM_PAGE_USER_PAGE_SHIFT_64 >> 32) & 0xFFFF), LSL #32
    movk X0, #((_COMM_PAGE_USER_PAGE_SHIFT_64 >> 48) & 0xFFFF), LSL #48
    ldrb W1, [X0]
    // Calculate the page size using the shift value.
    mov W2, #1
    lslv W3, W2, W1
    // Remember the page size.
    mov REG_PAGE_SIZE, X3

    // Get mach_task_self
    mov X16, SYSCALL_MACH_TASK_SELF
    svc 80
    // Save mach_task_self for later use.
    mov REG_MACH_TASK_SELF, X0

    // Allocate one page.
    str XZR, [SP, #-16]! // Push null pointer on stack
    mov X0, REG_MACH_TASK_SELF // Arg 1: target
    mov X1, SP // Arg 2: pointer to address (in/out)
    mov X2, REG_PAGE_SIZE // Arg 3: size to allocate
    mov X3, #1 // Arg 4: VM_FLAGS_ANYWHERE | VM_MAKE_TAG(VM_MEMORY_MALLOC_LARGE)
    movk X3, #0x0300, LSL #16 // value of the flags: 0x3000001
    mov X16, SYSCALL_MACH_VM_ALLOCATE
    svc 80
    cbnz X0, L_exit // Exit on failure.

    // Now we have the page address on the stack.
    ldr REG_ADDRESS, [SP]
    // Calculate adjacent page address
    add X0, REG_ADDRESS, REG_PAGE_SIZE
    // Store on the stack again for next call to `vm_allocate`
    str X0, [SP]

    // Allocate adjacent page.
    mov X0, REG_MACH_TASK_SELF // Arg 1: target
    mov X1, SP // Arg 2: pointer to address (in/out)
    mov X2, REG_PAGE_SIZE // Arg 3: size to allocate
    mov X3, #1 // Arg 4: VM_FLAGS_ANYWHERE | VM_MAKE_TAG(VM_MEMORY_REALLOC)
    movk X3, #0x0600, LSL #16 // value of the flags: 0x6000001
    mov X16, SYSCALL_MACH_VM_ALLOCATE
    svc 80
    cbnz X0, L_exit // Exit on failure.

    // Now we have two pages! On the stack we have the address of the second page.
    mov X0, REG_MACH_TASK_SELF // Arg 1: target
    mov X1, REG_ADDRESS // Arg 2: address (not a pointer)
    add X2, REG_PAGE_SIZE, REG_PAGE_SIZE // Arg 3: 2 * page_size
    mov X16, SYSCALL_MACH_VM_DEALLOCATE
    svc 80

L_exit:
    ret

Assemble and link using as -o example.o example.s ; ld -o example example.o. Run the binary, and if everything went well it should exit with return code 0.

碍人泪离人颜 2025-02-14 14:34:43

仅使用POSIX mmap 标志,杰斯特建议的是“乐观”策略,使用 mmap 没有 map_fixed 尝试尝试将新页面分配给您已经拥有的页面。 (第一个arg是您希望它分配的“提示”)。

它没有失败,而是将其分配在其他地方(除非虚拟地址空间已满,但这不太可能在64位)。因此,您需要检测到 mmap 的返回值!=您的提示。可能只是那个未触及的空间,然后再次询问您需要的全尺寸,然后复制。您可以尝试将剩余的空间mmap到您刚刚获得的新页面的末尾,但这可能会失败,然后您正在拨打更多的系统调用。

在Linux上,您将使用 mmap(map_fixed_noreplace)< / code>如果无法分配所需的位置(而不重叠 /替换现有映射),则返回错误。

当然,Linux MREMAP 更好,避免复制数据,如果允许使用相同的物理页面到新的虚拟地址(使用 MREMAP_MAYMOVE )。 ( mremap lets lets REALLOC REALLOC be 更有效地生长大数组。 ,您根本无法获得该功能。


我发现C ++ std :: vector 的设计真的很愚蠢,因此它无法轻易利用 realloc ,因此 MREMAP 即使它存在,可更换是一种潜在的可见副作用。而新的/删除分配器API完全缺乏即使在不可忽视的类型的类型中也可以使用的try-realloc。但是,这种过分保守的设计中的某些高级语言意味着即使存在,低级功能也可能不会得到太多使用,因此,如果Macos缺乏它,我不会感到惊讶。

OTOH,C realloc 当然可以使用 MREMAP 如果原始分配具有其本身的页面,并且很多内容是用C编写的,而不是由C ++的分配器API丢掉。因此,MacOS可能会以某种方式支持类似的内容,但我不知道MacOS特定的系统呼叫详细信息。

我确实查看了darwin XNU内核中的BSD系统调用表“ nofollow noreferrer”> https:href =“ https://stackoverflow.com/ macOS-64-Bit-system-call-table“> MacOS 64位系统呼叫表

问题/48845697 / 使用 0x2000000 class lit中的BSD呼叫家族。

有一个 int MemoryStatus_control(Uint32_t命令,INT32_T PID,UINT32_T标志,user_addr_t buffer,size_t buffersize); ,但这返回 int int ,所以我假设这不是我们''''重新寻找。

我没有看到任何其他有希望的系统呼叫。

我没有检查Macos Man页面上的MMAP;如果它具有任何MacOS特定的标志,例如 map_fixed_noreplace ,希望它们会在那里。

Using only POSIX mmap flags, the "optimistic" strategy is what Jester suggested, using mmap without MAP_FIXED to try to allocate new pages contiguous with what you already have. (The first arg is a "hint" of where you'd like it to allocate).

Instead of failing, it will allocate somewhere else (unless virtual address space is full, but that's unlikely on 64-bit). So you need to detect that mmap's return value != your hint. Probably just munmap that untouched space and ask again with the full size you need, then copy. You could attempt to mmap the remaining space onto the end of the new pages you just got, but that could fail and then you're making even more system calls.

On Linux you'd use mmap(MAP_FIXED_NOREPLACE) to return an error if it can't allocate where you want (without overlapping / replacing existing mappings).

Of course Linux mremap is even better, avoiding ever copying the data, just mapping the same physical pages to a new virtual address if you let it (with MREMAP_MAYMOVE). (mremap lets realloc be much more efficient for growing big arrays.) If MacOS doesn't have similar functionality via any MacOS-specific function calls or mmap flags, you simply can't get that functionality.


I find it really dumb that C++ std::vector is designed so it can't easily take advantage of realloc and thus mremap even if it exists, with replaceable new being a potentially visible side effect. And the new/delete allocator API entirely lacking a try-realloc that you could use even with non-trivially-copyable types. But this overly-conservative design in some higher-level languages means that low-level features might not get much use even if they existed, so I wouldn't be surprised if MacOS lacked it.

OTOH, C realloc certainly can use mremap if the original allocation has its pages to itself, and lots of stuff is written in C, not hobbled by C++'s allocator API. So MacOS might well support something like this somehow, but I don't know MacOS-specific system call details.

I did have a look at the table of BSD system calls in the Darwin XNU kernel https://github.com/opensource-apple/xnu/blob/master/bsd/kern/syscalls.master as suggested by macOS 64-bit System Call Table

There might be other whole categories of system call, but I'd hope that any mmap-related calls would be in the BSD family of calls, using the 0x2000000 class bit.

There is a int memorystatus_control(uint32_t command, int32_t pid, uint32_t flags, user_addr_t buffer, size_t buffersize); but that returns an int, so I assume it's not what we're looking for.

I didn't see any other system calls that looked at all promising for this.

I didn't check the MacOS man page for mmap; if it has any MacOS-specific flags like MAP_FIXED_NOREPLACE, they'd hopefully be there.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文