我可以强制 numpy ndarray 获取其内存的所有权吗？

发布于 2024-12-24 16:32:49 字数 1756 浏览 2 评论 0原文

我有一个 C 函数，它可以 mallocs() 并填充二维浮点数组。它“返回”该地址和数组的大小。签名是

int get_array_c(float** addr, int* nrows, int* ncols);

我想从Python调用它，所以我使用ctypes。

import ctypes
mylib = ctypes.cdll.LoadLibrary('mylib.so')
get_array_c = mylib.get_array_c

我从来不知道如何用 ctypes 指定参数类型。我倾向于为我使用的每个 C 函数编写一个 python 包装器，并确保在包装器中获得正确的类型。浮点数组是一个按列主序排列的矩阵，我想将其作为 numpy.ndarray 获取。但它相当大，所以我想使用C函数分配的内存，而不是复制它。（我刚刚在 StackOverflow 答案中找到了 PyBuffer_FromMemory 内容：https://stackoverflow.com/a/4355701/3691）

buffer_from_memory = ctypes.pythonapi.PyBuffer_FromMemory
buffer_from_memory.restype = ctypes.py_object

import numpy
def get_array_py():
    nrows = ctypes.c_int()
    ncols = ctypes.c_int()
    addr_ptr = ctypes.POINTER(ctypes.c_float)()
    get_array_c(ctypes.byref(addr_ptr), ctypes.byref(nrows), ctypes.byref(ncols))
    buf = buffer_from_memory(addr_ptr, 4 * nrows * ncols)
    return numpy.ndarray((nrows, ncols), dtype=numpy.float32, order='F',
                         buffer=buf)

这似乎给了我一个具有正确值的数组。但我很确定这是内存泄漏。

>>> a = get_array_py()
>>> a.flags.owndata
False

数组不拥有内存。很公平;默认情况下，当从缓冲区创建数组时，不应该这样做。但在这种情况下应该如此。当 numpy 数组被删除时，我真的希望 python 为我释放缓冲区内存。看起来如果我可以强制 owndata 为 True，那就应该可以，但是 owndata 不可设置。

不令人满意的解决方案：

让 get_array_py() 的调用者负责释放内存。这太烦人了；调用者应该能够像处理任何其他 numpy 数组一样处理这个 numpy 数组。
在 get_array_py 中将原始数组复制到一个新的 numpy 数组（具有自己的独立内存），删除第一个数组，并释放 get_array_py() 中的内存。返回副本而不是原始数组。这很烦人，因为它应该是不必要的内存复制。

有办法做我想做的事吗？我无法修改 C 函数本身，但如果有帮助的话，我可以向库中添加另一个 C 函数。

原文

I have a C function that mallocs() and populates a 2D array of floats. It "returns" that address and the size of the array. The signature is

int get_array_c(float** addr, int* nrows, int* ncols);

I want to call it from Python, so I use ctypes.

import ctypes
mylib = ctypes.cdll.LoadLibrary('mylib.so')
get_array_c = mylib.get_array_c

I never figured out how to specify argument types with ctypes. I tend to just write a python wrapper for each C function I'm using, and make sure I get the types right in the wrapper. The array of floats is a matrix in column-major order, and I'd like to get it as a numpy.ndarray. But its pretty big, so I want to use the memory allocated by the C function, not copy it. (I just found this PyBuffer_FromMemory stuff in this StackOverflow answer: https://stackoverflow.com/a/4355701/3691)

buffer_from_memory = ctypes.pythonapi.PyBuffer_FromMemory
buffer_from_memory.restype = ctypes.py_object

import numpy
def get_array_py():
    nrows = ctypes.c_int()
    ncols = ctypes.c_int()
    addr_ptr = ctypes.POINTER(ctypes.c_float)()
    get_array_c(ctypes.byref(addr_ptr), ctypes.byref(nrows), ctypes.byref(ncols))
    buf = buffer_from_memory(addr_ptr, 4 * nrows * ncols)
    return numpy.ndarray((nrows, ncols), dtype=numpy.float32, order='F',
                         buffer=buf)

This seems to give me an array with the right values. But I'm pretty sure it's a memory leak.

>>> a = get_array_py()
>>> a.flags.owndata
False

The array doesn't own the memory. Fair enough; by default, when the array is created from a buffer, it shouldn't. But in this case it should. When the numpy array is deleted, I'd really like python to free the buffer memory for me. It seems like if I could force owndata to True, that should do it, but owndata isn't settable.

Unsatisfactory solutions:

Make the caller of get_array_py() responsible for freeing the memory. That's super annoying; the caller should be able to treat this numpy array just like any other numpy array.
Copy the original array into a new numpy array (with its own, separate memory) in get_array_py, delete the first array, and free the memory inside get_array_py(). Return the copy instead of the original array. This is annoying because it's an ought-to-be unnecessary memory copy.

Is there a way to do what I want? I can't modify the C function itself, although I could add another C function to the library if that's helpful.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ぃ双果 2024-12-31 16:32:49

我只是偶然发现了这个问题，这在 2013 年 8 月仍然是一个问题。Numpy 对 OWNDATA 标志非常挑剔：无法在 Python 级别上对其进行修改，因此 ctypes 很可能会无法做到这一点。在 numpy C-API 级别 - 现在我们正在讨论一种完全不同的制作 Python 扩展模块的方法 - 必须使用以下方式显式设置标志：

PyArray_ENABLEFLAGS(arr, NPY_ARRAY_OWNDATA);

On numpy < 1.7，必须更明确：

((PyArrayObject*)arr)->flags |= NPY_OWNDATA;

如果对底层 C 函数/库有任何控制，最好的解决方案是从 Python 传递一个适当大小的空 numpy 数组来存储结果。基本原则是内存分配应该始终在可能的最高级别上完成，在本例中是在 Python 解释器级别上完成。

正如 kynan 在下面评论的那样，如果您使用 Cython，则必须手动公开函数 PyArray_ENABLEFLAGS ，请参阅这篇文章强制 NumPy ndarray 取得其内存的所有权赛通。

相关文档是这里
和此处。

I just stumbled upon this question, which is still an issue in August 2013. Numpy is really picky about the OWNDATA flag: There is no way it can be modified on the Python level, so ctypes will most likely not be able to do this. On the numpy C-API level - and now we are talking about a completely different way of making Python extension modules - one has to explicitly set the flag with:

PyArray_ENABLEFLAGS(arr, NPY_ARRAY_OWNDATA);

On numpy < 1.7, one had to be even more explicit:

((PyArrayObject*)arr)->flags |= NPY_OWNDATA;

If one has any control over the underlying C function/library, the best solution is to pass it an empty numpy array of the appropriate size from Python to store the result in. The basic principle is that memory allocation should always be done on the highest level possible, in this case on the level of the Python interpreter.

As kynan commented below, if you use Cython, you have to expose the function PyArray_ENABLEFLAGS manually, see this post Force NumPy ndarray to take ownership of its memory in Cython.

The relevant documentation is here
and here.

回复收藏 0 原文

老旧海报 2024-12-31 16:32:49

我倾向于从 C 库导出两个函数：

int get_array_c_nomalloc(float* addr, int nrows, int ncols); /* Pass addr as argument */
int get_array_c(float **addr, int nrows, int ncols); /* Calls function above */

然后编写 get_array_c 的 Python 包装器 [1] 来分配数组，然后调用 get_array_c_nomalloc。那么Python确实拥有内存。您可以将此包装器集成到您的库中，这样您的用户就不必知道 get_array_c_nomalloc 的存在。

[1] 这不再是真正的包装器，而是一个适配器。

I would tend to have two functions exported from my C library:

int get_array_c_nomalloc(float* addr, int nrows, int ncols); /* Pass addr as argument */
int get_array_c(float **addr, int nrows, int ncols); /* Calls function above */

I would then write my Python wrapper[1] of get_array_c to allocate the array, then call get_array_c_nomalloc. Then Python does own the memory. You could integrate this wrapper into your library so your user never has to be aware of get_array_c_nomalloc's existence.

[1] This isn't really a wrapper anymore, but instead is an adapter.

回复收藏 0 原文

~没有更多了~