如何从 PyCUDA 中现有的 numpy 数组创建页面锁定内存?

发布于 2024-12-08 16:12:01 字数 1552 浏览 2 评论 0原文

PyCUDA 帮助说明了如何创建空数组或归零数组,但没有说明如何创建将现有的 numpy 数组移动(?)到页锁定内存中。我是否需要获取 numpy 数组的指针并将其传递给 pycuda.driver.PagelockedHostAllocation ?我该怎么做呢?

更新

<--狙击-->

更新2

感谢talonmies的帮助。现在内存传输是页锁定的,但程序以以下错误结束:

PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: invalid context

这是更新的代码:

#!/usr/bin/env python
# -*- coding: utf-8 -*-


import numpy as np
import ctypes
from pycuda import driver, compiler, gpuarray
from pycuda.tools import PageLockedMemoryPool
import pycuda.autoinit

memorypool = PageLockedMemoryPool()

indata = np.random.randn(5).astype(np.float32)
outdata = gpuarray.zeros(5, dtype=np.float32)

pinnedinput = memorypool.allocate(indata.shape,np.float32)

source = indata.ctypes.data_as(ctypes.POINTER(ctypes.c_float))
dest = pinnedinput.ctypes.data_as(ctypes.POINTER(ctypes.c_float))
sz = indata.size * ctypes.sizeof(ctypes.c_float)
ctypes.memmove(dest,source,sz)


kernel_code = """
 __global__ void kernel(float *indata, float *outdata) {
 int globalid = blockIdx.x * blockDim.x + threadIdx.x ;
 outdata[globalid] = indata[globalid]+1.0f;

 }
 """

mod = compiler.SourceModule(kernel_code)
kernel = mod.get_function("kernel")

kernel(
 driver.In(pinnedinput), outdata,
 grid = (5,1),
 block = (1, 1, 1),
)
print indata
print outdata.get()
memorypool.free_held()

The PyCUDA help explains how to create an empty or zeroed array but not how to move(?) an existing numpy array into page-locked memory. Do I need to get a pointer for the numpy array and pass it to pycuda.driver.PagelockedHostAllocation? And how would I do that?

UPDATE

<--sniped -->

UPDATE 2

Thanks talonmies for you help. Now the memory transfare is page-locked but the program ends with the following error:

PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: invalid context

This is the updated code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-


import numpy as np
import ctypes
from pycuda import driver, compiler, gpuarray
from pycuda.tools import PageLockedMemoryPool
import pycuda.autoinit

memorypool = PageLockedMemoryPool()

indata = np.random.randn(5).astype(np.float32)
outdata = gpuarray.zeros(5, dtype=np.float32)

pinnedinput = memorypool.allocate(indata.shape,np.float32)

source = indata.ctypes.data_as(ctypes.POINTER(ctypes.c_float))
dest = pinnedinput.ctypes.data_as(ctypes.POINTER(ctypes.c_float))
sz = indata.size * ctypes.sizeof(ctypes.c_float)
ctypes.memmove(dest,source,sz)


kernel_code = """
 __global__ void kernel(float *indata, float *outdata) {
 int globalid = blockIdx.x * blockDim.x + threadIdx.x ;
 outdata[globalid] = indata[globalid]+1.0f;

 }
 """

mod = compiler.SourceModule(kernel_code)
kernel = mod.get_function("kernel")

kernel(
 driver.In(pinnedinput), outdata,
 grid = (5,1),
 block = (1, 1, 1),
)
print indata
print outdata.get()
memorypool.free_held()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

无声情话 2024-12-15 16:12:01

您需要将数据从源数组复制到保存从 pycuda 返回的页面锁定分配的数组。最直接的方法是通过 ctypes:

import numpy
import ctypes

x=numpy.array([1,2,3,4],dtype=numpy.double)
y=numpy.zeros_like(x)

source = x.ctypes.data_as(ctypes.POINTER(ctypes.c_double))
dest = y.ctypes.data_as(ctypes.POINTER(ctypes.c_double))
sz = x.size * ctypes.sizeof(ctypes.c_double)

ctypes.memmove(dest,source,sz)

print y

numpy.ctypes 接口可用于获取指向用于保存数组数据的内存的指针,然后ctypes.memmove 用于在两个不同的 ndarray 之间进行复制。使用裸 C 指针的所有常见注意事项均适用,因此需要小心,但使用起来足够简单。

You will need to copy the data from your source array to the array holding the page locked allocation returned from pycuda. The most straightforward way to do that is via ctypes:

import numpy
import ctypes

x=numpy.array([1,2,3,4],dtype=numpy.double)
y=numpy.zeros_like(x)

source = x.ctypes.data_as(ctypes.POINTER(ctypes.c_double))
dest = y.ctypes.data_as(ctypes.POINTER(ctypes.c_double))
sz = x.size * ctypes.sizeof(ctypes.c_double)

ctypes.memmove(dest,source,sz)

print y

The numpy.ctypes interface can be used to get a pointer to the memory used to hold an arrays data, and then the ctypes.memmove used to copy between two different ndarrays. All the usual caveats of working with naked C pointers apply, so some care is required, but it is straightforward enough to use.

罗罗贝儿 2024-12-15 16:12:01

内存块仍然处于活动状态。您可以显式释放固定数组:

print memorypool.active_blocks
pinnedinput.base.free()
print memorypool.active_blocks
memorypool.free_held()

The memory block is still active. You might explicitly free the pinned array:

print memorypool.active_blocks
pinnedinput.base.free()
print memorypool.active_blocks
memorypool.free_held()
不醒的梦 2024-12-15 16:12:01

我一直在以更简单的方式执行此操作:

locked_ary = cuda.pagelocked_empty_like(ary, mem_flags=cuda.host_alloc_flags.DEVICEMAP)
locked_ary[:] = ary

结果具有正确的 AlignedHostAllocation 基础,并且计时与我使用 ctypes.memmove 获得的时间相同。

I've been doing this in a much simpler way:

locked_ary = cuda.pagelocked_empty_like(ary, mem_flags=cuda.host_alloc_flags.DEVICEMAP)
locked_ary[:] = ary

The result has the right AlignedHostAllocation base, and timings are identical to what I get by using ctypes.memmove.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文