从内存中打开?

发布于 2024-10-18 08:22:51 字数 513 浏览 1 评论 0 原文

我正在寻找一种直接从内存加载生成的目标代码的方法。

我知道,如果我将其写入文件,我可以调用 dlopen 来动态加载其符号并链接它们。然而,考虑到它从内存中开始,写入磁盘,然后由 dlopen 重新加载到内存中,这似乎有点迂回。我想知道是否有某种方法可以动态链接内存中存在的目标代码。据我所知,可能有几种不同的方法可以做到这一点:

  1. 欺骗 dlopen 认为您的内存位置是一个文件,即使它永远不会离开内存。

  2. 找到一些其他系统调用来完成我正在寻找的事情(我认为这不存在)。

  3. 找一些可以直接在内存中链接代码的动态链接库。显然,这个有点难以用谷歌搜索,因为“动态链接库”会显示有关如何动态链接库的信息,而不是有关执行动态链接任务的库的信息。

  4. 从链接器中提取一些 API 并根据其代码库创建一个新库。 (显然这对我来说是最不理想的选择)。

那么其中哪些是可能的呢?可行的?你能指出我假设存在的任何事物吗?还有其他我没想到的方法吗?

I'm looking for a way to load generated object code directly from memory.

I understand that if I write it to a file, I can call dlopen to dynamically load its symbols and link them. However, this seems a bit of a roundabout way, considering that it starts off in memory, is written to disk, and then is reloaded in memory by dlopen. I'm wondering if there is some way to dynamically link object code that exists in memory. From what I can tell there might be a few different ways to do this:

  1. Trick dlopen into thinking that your memory location is a file, even though it never leaves memory.

  2. Find some other system call which does what I'm looking for (I don't think this exists).

  3. Find some dynamic linking library which can link code directly in memory. Obviously, this one is a bit hard to google for, as "dynamic linking library" turns up information on how to dynamically link libraries, not on libraries which perform the task of dynamically linking.

  4. Abstract some API from a linker and create a new library out its codebase. (obviously this is the least desirable option for me).

So which ones of these are possible? feasible? Could you point me to any of the things I hypothesized existed? Is there another way I haven't even thought of?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

夜血缘 2024-10-25 08:22:51

我需要一个解决方案,因为我有一个可编写脚本的系统,没有文件系统(使用数据库中的 blob)并且需要加载二进制插件来支持某些脚本。这是我提出的解决方案,它可以在 FreeBSD 上运行,但可能不可移植。

void *dlblob(const void *blob, size_t len) {
    /* Create shared-memory file descriptor */
    int fd = shm_open(SHM_ANON, O_RDWR, 0);
    ftruncate(fd, len);
    /* MemMap file descriptor, and load data */
    void *mem = mmap(NULL, len, PROT_WRITE, MAP_SHARED, fd, 0);
    memcpy(mem, blob, len);
    munmap(mem, len);
    /* Open Dynamic Library from SHM file descriptor */
    void *so = fdlopen(fd,RTLD_LAZY);
    close(fd);
    return so;
}

显然,代码缺乏任何类型的错误检查等,但这是核心功能。

ETA:我最初认为 fdlopen 是 POSIX 的假设是错误的,这似乎是 FreeBSD 主义。

I needed a solution to this because I have a scriptable system that has no filesystem (using blobs from a database) and needs to load binary plugins to support some scripts. This is the solution I came up with which works on FreeBSD but may not be portable.

void *dlblob(const void *blob, size_t len) {
    /* Create shared-memory file descriptor */
    int fd = shm_open(SHM_ANON, O_RDWR, 0);
    ftruncate(fd, len);
    /* MemMap file descriptor, and load data */
    void *mem = mmap(NULL, len, PROT_WRITE, MAP_SHARED, fd, 0);
    memcpy(mem, blob, len);
    munmap(mem, len);
    /* Open Dynamic Library from SHM file descriptor */
    void *so = fdlopen(fd,RTLD_LAZY);
    close(fd);
    return so;
}

Obviously the code lacks any kind of error checking etc, but this is the core functionality.

ETA: My initial assumption that fdlopen is POSIX was wrong, this appears to be a FreeBSD-ism.

素手挽清风 2024-10-25 08:22:51

我不明白您为什么要考虑 dlopen,因为这将需要更多的不可移植代码来在磁盘上生成正确的对象格式(例如 ELF)以进行加载。如果您已经知道如何为您的体系结构生成机器代码,只需使用 PROT_READ|PROT_WRITE|PROT_EXEC mmap 内存并将代码放在那里,然后将地址分配给函数指针并调用它。很简单。

I don't see why you'd be considering dlopen, since that will require a lot more nonportable code to generate the right object format on disk (e.g. ELF) for loading. If you already know how to generate machine code for your architecture, just mmap memory with PROT_READ|PROT_WRITE|PROT_EXEC and put your code there, then assign the address to a function pointer and call it. Very simple.

那支青花 2024-10-25 08:22:51

除了写出文件然后使用 dlopen() 再次加载之外,没有标准方法可以做到这一点。

您可能会在当前的特定平台上找到一些替代方法。由您决定这是否比使用“标准和(相对)可移植”方法更好。

由于首先生成目标代码是特定于平台的,因此其他特定于平台的技术可能对您来说并不重要。但这是一个判断——无论如何,它取决于是否存在非标准技术,而这是相对不可能的。

There is no standard way to do it other than writing out the file and then loading it again with dlopen().

You may find some alternative method on your current specific platform. It will be up to you to decide whether that is better than using the 'standard and (relatively) portable' approach.

Since generating the object code in the first place is rather platform-specific, additional platform-specific techniques may not matter to you. But it is a judgement call — and in any case, it depends on there being a non-standard technique, which is relatively improbable.

夜还是长夜 2024-10-25 08:22:51

我们在 Google 实现了一种方法来做到这一点。不幸的是上游 glibc 未能理解这一需求,因此它从未被接受。包含补丁的功能请求已停止。它称为dlopen_from_offset

dlopen_with_offset glibc 代码可在 glibc google/grte* 分支中找到。但没有人应该享受修改自己的 glibc。

We implemented a way to do this at Google. Unfortunately upstream glibc has failed to comprehend the need so it was never accepted. The feature request with patches has stalled. It's known as dlopen_from_offset.

The dlopen_with_offset glibc code is available in the glibc google/grte* branches. But nobody should enjoy modifying their own glibc.

烟若柳尘 2024-10-25 08:22:51

以下是在 Linux 上使用内存 fd 和 memfd_create 完全在内存中完成此操作的方法(无需写入 /tmp/xxx):

user@system $ ./main < example-library.so
add(1, 2) = 3
// example-library.c
int add(int a, int b) { return a + b; }
#include <cstdio>
#include <dlfcn.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <vector>

// Compile and then invoke as:
// $ ./main < my-shared-lib.so
int main() {
  // Read the shared library contents from stdin
  std::vector<char> library_contents;
  char buffer[1024];
  ssize_t bytes_read;
  while ((bytes_read = read(STDIN_FILENO, buffer, sizeof(buffer))) > 0) {
    library_contents.insert(library_contents.end(), buffer,
                            buffer + bytes_read);
  }

  // Create a memory file descriptor using memfd_create
  int fd = memfd_create("shared_library", 0);
  if (fd == -1) {
    perror("memfd_create failed");
    return 1;
  }

  // Write the shared library contents to the file descriptor
  if (write(fd, library_contents.data(), library_contents.size()) !=
      static_cast<ssize_t>(library_contents.size())) {
    perror("write failed");
    return 1;
  }

  // Create a path to the file descriptor using /proc/self/fd
  // https://sourceware.org/bugzilla/show_bug.cgi?id=30100#c33
  char path[100]; // > 35 == strlen("/proc/self/fd/") + log10(pow(2, 64)) + 1
  snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);

  // Use dlopen to dynamically load the shared library
  void *handle = dlopen(path, RTLD_NOW);
  if (handle == NULL) {
    fprintf(stderr, "dlopen failed: %s\n", dlerror());
    return 1;
  }

  // Use the shared library...
  // Get a pointer to the function "int add(int, int)"
  int (*add)(int, int) =
      reinterpret_cast<int (*)(int, int)>(dlsym(handle, "add"));

  if (add == NULL) {
    fprintf(stderr, "dlsym failed: %s\n", dlerror());
    return 1;
  }

  // Call the function "int add(int, int)"
  printf("add(1, 2) = %d\n", add(1, 2));

  // Cleanup
  dlclose(handle);
  close(fd);
  return 0;
}

Here's how you can do it entirely in-memory on Linux (no writing to /tmp/xxx) using a memory fd with memfd_create:

user@system $ ./main < example-library.so
add(1, 2) = 3
// example-library.c
int add(int a, int b) { return a + b; }
#include <cstdio>
#include <dlfcn.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <vector>

// Compile and then invoke as:
// $ ./main < my-shared-lib.so
int main() {
  // Read the shared library contents from stdin
  std::vector<char> library_contents;
  char buffer[1024];
  ssize_t bytes_read;
  while ((bytes_read = read(STDIN_FILENO, buffer, sizeof(buffer))) > 0) {
    library_contents.insert(library_contents.end(), buffer,
                            buffer + bytes_read);
  }

  // Create a memory file descriptor using memfd_create
  int fd = memfd_create("shared_library", 0);
  if (fd == -1) {
    perror("memfd_create failed");
    return 1;
  }

  // Write the shared library contents to the file descriptor
  if (write(fd, library_contents.data(), library_contents.size()) !=
      static_cast<ssize_t>(library_contents.size())) {
    perror("write failed");
    return 1;
  }

  // Create a path to the file descriptor using /proc/self/fd
  // https://sourceware.org/bugzilla/show_bug.cgi?id=30100#c33
  char path[100]; // > 35 == strlen("/proc/self/fd/") + log10(pow(2, 64)) + 1
  snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);

  // Use dlopen to dynamically load the shared library
  void *handle = dlopen(path, RTLD_NOW);
  if (handle == NULL) {
    fprintf(stderr, "dlopen failed: %s\n", dlerror());
    return 1;
  }

  // Use the shared library...
  // Get a pointer to the function "int add(int, int)"
  int (*add)(int, int) =
      reinterpret_cast<int (*)(int, int)>(dlsym(handle, "add"));

  if (add == NULL) {
    fprintf(stderr, "dlsym failed: %s\n", dlerror());
    return 1;
  }

  // Call the function "int add(int, int)"
  printf("add(1, 2) = %d\n", add(1, 2));

  // Cleanup
  dlclose(handle);
  close(fd);
  return 0;
}
羁〃客ぐ 2024-10-25 08:22:51

从内存加载 solib 有一个固有的限制。
也就是说,solib 的 DT_NEEDED deps 无法引用
到内存缓冲区。这意味着,除其他外,
你不能轻易地用 deps 加载 solib
内存缓冲区。恐怕,除非ELF规范
扩展为允许 DT_NEEDED 引用其他
对象比文件名更没有标准
用于从内存缓冲区加载 solib 的 API。

我认为你需要使用posix的shm_open(),然后
mmap 共享内存,在那里生成你的 solib,
然后通过 /dev/shm 挂载点使用普通 dlopen() 。
这样也可以处理部门:他们可以
引用常规文件或 /dev/shm 对象
有你生成的 solibs。

Loading the solib from memory has an inherent limitation.
Namely, the DT_NEEDED deps of a solib cannot refer
to the memory buffer. This means, among other things,
that you can't easily load solib with deps from the
memory buffer. I am afraid, unless the ELF specification
is extended to allow DT_NEEDED to refer to other
objects than the file names, there will be no standard
API for loading the solib from memory buffer.

I think you need to use posix's shm_open(), then
mmap the shared memory, generate your solib there,
then use plain dlopen() via /dev/shm mount point.
This way also the deps can be handled: they can
refer to the regular files or to the /dev/shm objects
that have your generated solibs.

违心° 2024-10-25 08:22:51

您不需要加载内存中生成的代码,因为它已经在内存中了!

但是,您可以以非可移植的方式在内存中生成机器代码(前提是它位于带有 PROT_EXEC 标志的 mmap 内存段中)。

(在这种情况下,不需要“链接”或重定位步骤,因为您生成具有确定的绝对或相对地址的机器代码,特别是调用外部函数)

存在一些库可以做到这一点:在 GNU 上/Linux x86x86-64,我知道 GNU Lightning(快速生成运行缓慢的机器代码),DotGNU LibJIT(生成中等质量的代码),以及LLVM & GCCJIT (它能够在内存中生成相当优化的代码,但需要时间来发出它) 。 LuaJit 也有一些类似的功能。自 2015 年起,GCC 5 有了一个 gccjit 库。

当然,您仍然可以在文件中生成 C 代码,派生编译器将其编译为共享对象,然后 dlopen 该共享对象文件。我正在使用 GCC MELT 来实现这一点,这是一种扩展 GCC 的领域特定语言。它在实践中确实运作得很好。

附录

如果写入生成的 C 文件的性能是一个问题(不应该如此,因为编译 C 文件比写入它慢得多),请考虑使用一些 tmpfs 文件系统(可能在 /tmp/ 中,这通常是 Linux 上的 tmpfs 文件系统)

You don't need to load the code generated in memory, since it is already in memory!

However, you can -in a non portable way- generate machine code in memory (provided it is in a memory segment mmap-ed with PROT_EXEC flag).

(in that case, no "linking" or relocation step is required, since you generate machine code with definitive absolute or relative addresses, in particular to call external functions)

Some libraries exist which do that: On GNU/Linux under x86 or x86-64, I know of GNU Lightning (which generates quickly machine code which runs slowly), DotGNU LibJIT (which generates medium quality code), and LLVM & GCCJIT (which is able to generate quite optimized code in memory, but takes time to emit it). And LuaJit has some similar facility too. Since 2015 GCC 5 has a gccjit library.

And of course, you can still generate C code in a file, fork a compiler to compile it into a shared object, and dlopen that shared object file. I'm doing that in GCC MELT , a domain specific language to extend GCC. It does work quite well in practice.

addenda

If performance of writing the generated C file is a concern (it should not be, since compiling a C file is much slower than writing it) consider using some tmpfs file system for that (perhaps in /tmp/ which is often a tmpfs filesystem on Linux)

痴梦一场 2024-10-25 08:22:51

需要注意的是,使用shm_open+dlopen从共享内存加载动态库,如果/dev/shm有noexec权限,动态库将加载失败。

It should be noted that using shm_open+dlopen loads the dynamic library from shared memory, if/dev/shm has the permission of noexec, the dynamic library will fail to load.

风渺 2024-10-25 08:22:51

我找到了解决这个问题的方法。
使用 memfd_create 创建内存文件,然后从 dlopen 打开。

https://x-c3ll.github.io/posts/fileless-memfd_create/< /a>

I found a solution to this.
Creating a memory file using memfd_create and then opening from dlopen.

https://x-c3ll.github.io/posts/fileless-memfd_create/

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文