有什么理由使用 malloc 而不是 PyMem_Malloc 吗？

发布于 2024-10-14 13:08:39 字数 285 浏览 9 评论 0 原文

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

忘年祭陌 2024-10-21 13:08:40

扩展使用 malloc 或其他系统分配器分配内存是完全可以的。对于许多类型的模块来说，这是正常且不可避免的——大多数包装其他库的模块本身对 Python 一无所知，当它们发生在该库中时，将导致本机分配。（有些库允许您充分控制分配以防止这种情况；大多数库不允许。）

使用 PyMem_Malloc 有一个严重的缺点：使用它时需要保留 GIL。在进行 CPU 密集型计算或进行任何可能阻塞的调用（如 I/O）时，本机库通常希望释放 GIL。需要在分配之前锁定 GIL 可能会非常不方便并且会出现性能问题。

使用Python的包装器进行内存分配允许使用Python的内存调试代码。然而，对于像 Valgrind 这样的工具，我怀疑它的现实价值。

如果 API 需要，您将需要使用这些函数；例如，如果向 API 传递一个必须使用这些函数分配的指针，则可以使用它们来释放该指针。除非有像使用它们这样的明确原因，否则我坚持正常分配。

回复收藏 0 原文

半暖夏伤 2024-10-21 13:08:40

根据我编写 MATLAB .mex 函数的经验，我认为是否使用 malloc 的最大决定因素是可移植性。假设您有一个头文件，它仅使用内部 C 数据类型执行一系列有用的函数（不需要 Python 对象交互，因此使用 malloc 没有问题），并且您突然意识到您想要将该头文件移植到具有以下功能的不同代码库：与 Python 无关（也许这是一个纯粹用 C 编写的项目），使用 malloc 显然是一个更便携的解决方案。

但对于纯粹是 Python 扩展的代码，我的最初反应是期望本机 c 函数执行得更快。我没有证据支持这一点:)

回复收藏 0 原文

ゞ记忆︶ㄣ 2024-10-21 13:08:39

编辑：混合PyMem_Malloc和PyObject_Malloc更正；它们是两个不同的调用。

如果没有激活 PYMALLOC_DEBUG 宏，PyMem_Malloc 是 libc 的 malloc() 的别名，有一种特殊情况：调用 PyMem_Malloc > 分配零字节将返回非 NULL 指针，而 malloc(zero_bytes) 可能返回 NULL 值或引发系统错误 (源代码参考）：

/* malloc.请注意 nbytes==0 尝试
返回一个非空指针，不同

来自所有其他当前活动的指针。这可能是不可能的。
*/

另外，还有关于 < 的建议说明代码>pymem.h头文件：

切勿将 PyMem_ 调用与
调用平台 malloc/realloc/
调用/释放。例如，在 Windows 上
不同的 DLL 最终可能会使用
不同的堆，如果你使用
PyMem_Malloc 你会得到内存
来自Python使用的堆
动态链接库；如果你这样做，那可能会是一场灾难
free()'ed 直接在你自己的
扩大。使用 PyMem_Free 代替
确保Python可以返回
内存到正确的堆。作为另一个
例如，在 PYMALLOC_DEBUG 模式下，
Python 包装了对所有 PyMem_ 的所有调用
和 PyObject_ 中的内存函数
特殊的调试包装器添加
附加调试信息
动态内存块。系统
例行公事不知道该做什么
那些东西，还有Python
包装者不知道该怎么做
直接获得的原始块
然后是系统例程。

然后，~~PyMem_Malloc~~ PyObject_Malloc 内部有一些 Python 特定的调整，该函数不仅用于 C 扩展，还用于运行时的所有动态分配。 Python 程序，如 100*234、str(100) 或 10 + 4j：

>>> id(10 + 4j)
139721697591440
>>> id(10 + 4j)
139721697591504
>>> id(10 + 4j)
139721697591440

前面的 complex()实例是分配在专用池上的小对象。

使用 ~~PyMem_Malloc~~ PyObject_Malloc 分配小对象（<256 字节）非常高效，因为它是从 8 字节对齐块池（现有一个池）完成的对于每个块大小。还有用于更大分配的页面和竞技场块。

此评论对源代码解释了如何优化 PyObject_Malloc 调用：

/*
 * The basic blocks are ordered by decreasing execution frequency,
 * which minimizes the number of jumps in the most common cases,
 * improves branching prediction and instruction scheduling (small
 * block allocations typically result in a couple of instructions).
 * Unless the optimizer reorders everything, being too smart...
 */

池、页面和竞技场是旨在减少外部内存碎片。

查看源代码以获取有关 Python 的完整详细文档内存内部结构。

EDIT: Mixed PyMem_Malloc and PyObject_Malloc corrections; they are two different calls.

Without the PYMALLOC_DEBUG macro activated, PyMem_Malloc is an alias of libc's malloc(), having one special case: calling PyMem_Malloc to allocate zero bytes will return a non-NULL pointer, while malloc(zero_bytes) might return a NULL value or raise a system error (source code reference):

/* malloc. Note that nbytes==0 tries
to return a non-NULL pointer, distinct

from all other currently live pointers. This may not be possible.
*/

Also, there is an advisory note on the pymem.h header file:

Never mix calls to PyMem_ with
calls to the platform malloc/realloc/
calloc/free. For example, on Windows
different DLLs may end up using
different heaps, and if you use
PyMem_Malloc you'll get the memory
from the heap used by the Python
DLL; it could be a disaster if you
free()'ed that directly in your own
extension. Using PyMem_Free instead
ensures Python can return the
memory to the proper heap. As another
example, in PYMALLOC_DEBUG mode,
Python wraps all calls to all PyMem_
and PyObject_ memory functions in
special debugging wrappers that add
additional debugging info to
dynamic memory blocks. The system
routines have no idea what to do
with that stuff, and the Python
wrappers have no idea what to do
with raw blocks obtained directly by
the system routines then.

Then, there are some Python specific tunings inside ~~PyMem_Malloc~~ PyObject_Malloc, a function used not only for C extensions but for all the dynamic allocations while running a Python program, like 100*234, str(100) or 10 + 4j:

>>> id(10 + 4j)
139721697591440
>>> id(10 + 4j)
139721697591504
>>> id(10 + 4j)
139721697591440

The previous complex() instances are small objects allocated on a dedicated pool.

Small objects (<256 bytes) allocation with ~~PyMem_Malloc~~ PyObject_Malloc is quite efficient since it's done from a pool 8 bytes aligned blocks, existing one pool for each block size. There are also Pages and Arenas blocks for bigger allocations.

This comment on the source code explains how the PyObject_Malloc call is optimized:

/*
 * The basic blocks are ordered by decreasing execution frequency,
 * which minimizes the number of jumps in the most common cases,
 * improves branching prediction and instruction scheduling (small
 * block allocations typically result in a couple of instructions).
 * Unless the optimizer reorders everything, being too smart...
 */

Pools, Pages and Arenas are optimizations intended to reduce external memory fragmentation of long running Python programs.

Check out the source code for the full detailed documentation on Python's memory internals.

回复收藏 0 原文

~没有更多了~