I'm reading the documentation for Memory Management in Python C extensions, and as far as I can tell, there doesn't really seem to be much reason to use malloc rather than PyMem_Malloc. Say I want to allocate an array that isn't to be exposed to Python source code and will be stored in an object that will be garbage collected. Is there any reason to use malloc?
如果 API 需要,您将需要使用这些函数;例如,如果向 API 传递一个必须使用这些函数分配的指针,则可以使用它们来释放该指针。除非有像使用它们这样的明确原因,否则我坚持正常分配。
It's perfectly OK for extensions to allocate memory with malloc, or other system allocators. That's normal and inevitable for many types of modules--most modules that wrap other libraries, which themselves know nothing about Python, will cause native allocations when they happen within that library. (Some libraries allow you to control allocation enough to prevent this; most do not.)
There's a serious drawback to using PyMem_Malloc: you need to hold the GIL when using it. Native libraries often want to release the GIL when doing CPU-intensive calculations or making any calls that might block, like I/O. Needing to lock the GIL before allocations can be somewhere between very inconvenient and a performance problem.
Using Python's wrappers for memory allocation allows Python's memory debugging code to be used. With tools like Valgrind I doubt the real-world value of that, however.
You'll need to use these functions if an API requires it; for example, if an API is passed a pointer that must be allocated with these functions, so it can be freed with them. Barring an explicit reason like that for using them, I stick with normal allocation.
根据我编写 MATLAB .mex 函数的经验,我认为是否使用 malloc 的最大决定因素是可移植性。假设您有一个头文件,它仅使用内部 C 数据类型执行一系列有用的函数(不需要 Python 对象交互,因此使用 malloc 没有问题),并且您突然意识到您想要将该头文件移植到具有以下功能的不同代码库:与 Python 无关(也许这是一个纯粹用 C 编写的项目),使用 malloc 显然是一个更便携的解决方案。
但对于纯粹是 Python 扩展的代码,我的最初反应是期望本机 c 函数执行得更快。我没有证据支持这一点:)
From my experience writing MATLAB .mex functions, I think the biggest determining factor in whether you use malloc or not is portability. Say you have a header file that performs a load of useful functions using internal c data types only (no necessary Python object interaction, so no problem using malloc), and you suddenly realise you want to port that header file to a different codebase that has nothing to do with Python whatsoever (maybe it's a project written purely in C), using malloc would obviously be a much more portable solution.
But for your code that is purely a Python extension, my initial reaction would be to expect the native c function to perform faster. I have no evidence to back this up :)
/*
* The basic blocks are ordered by decreasing execution frequency,
* which minimizes the number of jumps in the most common cases,
* improves branching prediction and instruction scheduling (small
* block allocations typically result in a couple of instructions).
* Unless the optimizer reorders everything, being too smart...
*/
EDIT: Mixed PyMem_Malloc and PyObject_Malloc corrections; they are two different calls.
Without the PYMALLOC_DEBUG macro activated, PyMem_Malloc is an alias of libc's malloc(), having one special case: calling PyMem_Malloc to allocate zero bytes will return a non-NULL pointer, while malloc(zero_bytes) might return a NULL value or raise a system error (source code reference):
/* malloc. Note that nbytes==0 tries
to return a non-NULL pointer, distinct
from all other currently live pointers. This may not be possible.
*/
Never mix calls to PyMem_ with
calls to the platform malloc/realloc/
calloc/free. For example, on Windows
different DLLs may end up using
different heaps, and if you use
PyMem_Malloc you'll get the memory
from the heap used by the Python
DLL; it could be a disaster if you
free()'ed that directly in your own
extension. Using PyMem_Free instead
ensures Python can return the
memory to the proper heap. As another
example, in PYMALLOC_DEBUG mode,
Python wraps all calls to all PyMem_
and PyObject_ memory functions in
special debugging wrappers that add
additional debugging info to
dynamic memory blocks. The system
routines have no idea what to do
with that stuff, and the Python
wrappers have no idea what to do
with raw blocks obtained directly by
the system routines then.
Then, there are some Python specific tunings inside PyMem_MallocPyObject_Malloc, a function used not only for C extensions but for all the dynamic allocations while running a Python program, like 100*234, str(100) or 10 + 4j:
The previous complex() instances are small objects allocated on a dedicated pool.
Small objects (<256 bytes) allocation with PyMem_MallocPyObject_Malloc is quite efficient since it's done from a pool 8 bytes aligned blocks, existing one pool for each block size. There are also Pages and Arenas blocks for bigger allocations.
This comment on the source code explains how the PyObject_Malloc call is optimized:
/*
* The basic blocks are ordered by decreasing execution frequency,
* which minimizes the number of jumps in the most common cases,
* improves branching prediction and instruction scheduling (small
* block allocations typically result in a couple of instructions).
* Unless the optimizer reorders everything, being too smart...
*/
Pools, Pages and Arenas are optimizations intended to reduce external memory fragmentation of long running Python programs.
Check out the source code for the full detailed documentation on Python's memory internals.
发布评论
评论(3)
扩展使用 malloc 或其他系统分配器分配内存是完全可以的。对于许多类型的模块来说,这是正常且不可避免的——大多数包装其他库的模块本身对 Python 一无所知,当它们发生在该库中时,将导致本机分配。 (有些库允许您充分控制分配以防止这种情况;大多数库不允许。)
使用 PyMem_Malloc 有一个严重的缺点:使用它时需要保留 GIL。在进行 CPU 密集型计算或进行任何可能阻塞的调用(如 I/O)时,本机库通常希望释放 GIL。需要在分配之前锁定 GIL 可能会非常不方便并且会出现性能问题。
使用Python的包装器进行内存分配允许使用Python的内存调试代码。然而,对于像 Valgrind 这样的工具,我怀疑它的现实价值。
如果 API 需要,您将需要使用这些函数;例如,如果向 API 传递一个必须使用这些函数分配的指针,则可以使用它们来释放该指针。除非有像使用它们这样的明确原因,否则我坚持正常分配。
It's perfectly OK for extensions to allocate memory with malloc, or other system allocators. That's normal and inevitable for many types of modules--most modules that wrap other libraries, which themselves know nothing about Python, will cause native allocations when they happen within that library. (Some libraries allow you to control allocation enough to prevent this; most do not.)
There's a serious drawback to using PyMem_Malloc: you need to hold the GIL when using it. Native libraries often want to release the GIL when doing CPU-intensive calculations or making any calls that might block, like I/O. Needing to lock the GIL before allocations can be somewhere between very inconvenient and a performance problem.
Using Python's wrappers for memory allocation allows Python's memory debugging code to be used. With tools like Valgrind I doubt the real-world value of that, however.
You'll need to use these functions if an API requires it; for example, if an API is passed a pointer that must be allocated with these functions, so it can be freed with them. Barring an explicit reason like that for using them, I stick with normal allocation.
根据我编写 MATLAB .mex 函数的经验,我认为是否使用 malloc 的最大决定因素是可移植性。假设您有一个头文件,它仅使用内部 C 数据类型执行一系列有用的函数(不需要 Python 对象交互,因此使用 malloc 没有问题),并且您突然意识到您想要将该头文件移植到具有以下功能的不同代码库:与 Python 无关(也许这是一个纯粹用 C 编写的项目),使用 malloc 显然是一个更便携的解决方案。
但对于纯粹是 Python 扩展的代码,我的最初反应是期望本机 c 函数执行得更快。我没有证据支持这一点:)
From my experience writing MATLAB .mex functions, I think the biggest determining factor in whether you use malloc or not is portability. Say you have a header file that performs a load of useful functions using internal c data types only (no necessary Python object interaction, so no problem using malloc), and you suddenly realise you want to port that header file to a different codebase that has nothing to do with Python whatsoever (maybe it's a project written purely in C), using malloc would obviously be a much more portable solution.
But for your code that is purely a Python extension, my initial reaction would be to expect the native c function to perform faster. I have no evidence to back this up :)
编辑:混合
PyMem_Malloc
和PyObject_Malloc
更正;它们是两个不同的调用。如果没有激活
PYMALLOC_DEBUG
宏,PyMem_Malloc
是 libc 的malloc()
的别名,有一种特殊情况:调用PyMem_Malloc
> 分配零字节将返回非 NULL 指针,而 malloc(zero_bytes) 可能返回 NULL 值或引发系统错误 (源代码参考):另外,还有关于 < 的建议说明代码>pymem.h头文件:
然后,
PyMem_Malloc
PyObject_Malloc
内部有一些 Python 特定的调整,该函数不仅用于 C 扩展,还用于运行时的所有动态分配。 Python 程序,如100*234
、str(100)
或10 + 4j
:前面的
complex()
实例是分配在专用池上的小对象。使用
PyMem_Malloc
PyObject_Malloc
分配小对象(<256 字节)非常高效,因为它是从 8 字节对齐块池(现有一个池)完成的对于每个块大小。还有用于更大分配的页面和竞技场块。此评论对源代码 解释了如何优化
PyObject_Malloc
调用:池、页面和竞技场是旨在减少外部内存碎片。
查看源代码以获取有关 Python 的完整详细文档内存内部结构。
EDIT: Mixed
PyMem_Malloc
andPyObject_Malloc
corrections; they are two different calls.Without the
PYMALLOC_DEBUG
macro activated,PyMem_Malloc
is an alias of libc'smalloc()
, having one special case: callingPyMem_Malloc
to allocate zero bytes will return a non-NULL pointer, while malloc(zero_bytes) might return a NULL value or raise a system error (source code reference):Also, there is an advisory note on the
pymem.h
header file:Then, there are some Python specific tunings inside
PyMem_Malloc
PyObject_Malloc
, a function used not only for C extensions but for all the dynamic allocations while running a Python program, like100*234
,str(100)
or10 + 4j
:The previous
complex()
instances are small objects allocated on a dedicated pool.Small objects (<256 bytes) allocation with
PyMem_Malloc
PyObject_Malloc
is quite efficient since it's done from a pool 8 bytes aligned blocks, existing one pool for each block size. There are also Pages and Arenas blocks for bigger allocations.This comment on the source code explains how the
PyObject_Malloc
call is optimized:Pools, Pages and Arenas are optimizations intended to reduce external memory fragmentation of long running Python programs.
Check out the source code for the full detailed documentation on Python's memory internals.