如何从 PyCUDA 中现有的 numpy 数组创建页面锁定内存?
PyCUDA 帮助说明了如何创建空数组或归零数组,但没有说明如何创建将现有的 numpy 数组移动(?)到页锁定内存中。我是否需要获取 numpy 数组的指针并将其传递给 pycuda.driver.PagelockedHostAllocation ?我该怎么做呢?
更新
<--狙击-->
更新2
感谢talonmies的帮助。现在内存传输是页锁定的,但程序以以下错误结束:
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: invalid context
这是更新的代码:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import numpy as np
import ctypes
from pycuda import driver, compiler, gpuarray
from pycuda.tools import PageLockedMemoryPool
import pycuda.autoinit
memorypool = PageLockedMemoryPool()
indata = np.random.randn(5).astype(np.float32)
outdata = gpuarray.zeros(5, dtype=np.float32)
pinnedinput = memorypool.allocate(indata.shape,np.float32)
source = indata.ctypes.data_as(ctypes.POINTER(ctypes.c_float))
dest = pinnedinput.ctypes.data_as(ctypes.POINTER(ctypes.c_float))
sz = indata.size * ctypes.sizeof(ctypes.c_float)
ctypes.memmove(dest,source,sz)
kernel_code = """
__global__ void kernel(float *indata, float *outdata) {
int globalid = blockIdx.x * blockDim.x + threadIdx.x ;
outdata[globalid] = indata[globalid]+1.0f;
}
"""
mod = compiler.SourceModule(kernel_code)
kernel = mod.get_function("kernel")
kernel(
driver.In(pinnedinput), outdata,
grid = (5,1),
block = (1, 1, 1),
)
print indata
print outdata.get()
memorypool.free_held()
The PyCUDA help explains how to create an empty or zeroed array but not how to move(?) an existing numpy array into page-locked memory. Do I need to get a pointer for the numpy array and pass it to pycuda.driver.PagelockedHostAllocation
? And how would I do that?
UPDATE
<--sniped -->
UPDATE 2
Thanks talonmies for you help. Now the memory transfare is page-locked but the program ends with the following error:
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: invalid context
This is the updated code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import numpy as np
import ctypes
from pycuda import driver, compiler, gpuarray
from pycuda.tools import PageLockedMemoryPool
import pycuda.autoinit
memorypool = PageLockedMemoryPool()
indata = np.random.randn(5).astype(np.float32)
outdata = gpuarray.zeros(5, dtype=np.float32)
pinnedinput = memorypool.allocate(indata.shape,np.float32)
source = indata.ctypes.data_as(ctypes.POINTER(ctypes.c_float))
dest = pinnedinput.ctypes.data_as(ctypes.POINTER(ctypes.c_float))
sz = indata.size * ctypes.sizeof(ctypes.c_float)
ctypes.memmove(dest,source,sz)
kernel_code = """
__global__ void kernel(float *indata, float *outdata) {
int globalid = blockIdx.x * blockDim.x + threadIdx.x ;
outdata[globalid] = indata[globalid]+1.0f;
}
"""
mod = compiler.SourceModule(kernel_code)
kernel = mod.get_function("kernel")
kernel(
driver.In(pinnedinput), outdata,
grid = (5,1),
block = (1, 1, 1),
)
print indata
print outdata.get()
memorypool.free_held()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您需要将数据从源数组复制到保存从 pycuda 返回的页面锁定分配的数组。最直接的方法是通过 ctypes:
numpy.ctypes 接口可用于获取指向用于保存数组数据的内存的指针,然后
ctypes.memmove
用于在两个不同的 ndarray 之间进行复制。使用裸 C 指针的所有常见注意事项均适用,因此需要小心,但使用起来足够简单。You will need to copy the data from your source array to the array holding the page locked allocation returned from pycuda. The most straightforward way to do that is via
ctypes
:The
numpy.ctypes
interface can be used to get a pointer to the memory used to hold an arrays data, and then thectypes.memmove
used to copy between two different ndarrays. All the usual caveats of working with naked C pointers apply, so some care is required, but it is straightforward enough to use.内存块仍然处于活动状态。您可以显式释放固定数组:
The memory block is still active. You might explicitly free the pinned array:
我一直在以更简单的方式执行此操作:
结果具有正确的 AlignedHostAllocation 基础,并且计时与我使用 ctypes.memmove 获得的时间相同。
I've been doing this in a much simpler way:
The result has the right
AlignedHostAllocation
base, and timings are identical to what I get by usingctypes.memmove
.