PyCUDA - 通过引用将矩阵从 python 传递到 C++ CUDA代码
我必须编写一个 PyCUDA 函数,该函数获取两个矩阵 Nx3 和 Mx3,并返回一个矩阵 NxM,但我无法弄清楚如何在不知道列数的情况下通过引用传递矩阵。
我的代码基本上是这样的:
#kernel declaration
mod = SourceModule("""
__global__ void distance(int N, int M, float d1[][3], float d2[][3], float res[][M])
{
int i = threadIdx.x;
int j = threadIdx.y;
float x, y, z;
x = d2[j][0]-d1[i][0];
y = d2[j][1]-d1[i][1];
z = d2[j][2]-d1[i][2];
res[i][j] = x*x + y*y + z*z;
}
""")
#load data
data1 = numpy.loadtxt("data1.txt").astype(numpy.float32) # Nx3 matrix
data2 = numpy.loadtxt("data2.txt").astype(numpy.float32) # Mx3 matrix
N=data1.shape[0]
M=data2.shape[0]
res = numpy.zeros([N,M]).astype(numpy.float32) # NxM matrix
#invoke kernel
dist_gpu = mod.get_function("distance")
dist_gpu(cuda.In(numpy.int32(N)), cuda.In(numpy.int32(M)), cuda.In(data1), cuda.In(data2), cuda.Out(res), block=(N,M,1))
#save data
numpy.savetxt("results.txt", res)
编译这个我收到一个错误:
kernel.cu(3): error: a parameter is not allowed
也就是说,我不能在函数声明中使用 M 作为 res[][] 的列数。我不能不声明列数...
我需要一个矩阵 NxM 作为输出,但我不知道如何做到这一点。你能帮助我吗?
I have to write in a PyCUDA function that gets two matrices Nx3 and Mx3, and return a matrix NxM, but I can't figure out how to pass by reference a matrix without knowing the number of columns.
My code basically is something like that:
#kernel declaration
mod = SourceModule("""
__global__ void distance(int N, int M, float d1[][3], float d2[][3], float res[][M])
{
int i = threadIdx.x;
int j = threadIdx.y;
float x, y, z;
x = d2[j][0]-d1[i][0];
y = d2[j][1]-d1[i][1];
z = d2[j][2]-d1[i][2];
res[i][j] = x*x + y*y + z*z;
}
""")
#load data
data1 = numpy.loadtxt("data1.txt").astype(numpy.float32) # Nx3 matrix
data2 = numpy.loadtxt("data2.txt").astype(numpy.float32) # Mx3 matrix
N=data1.shape[0]
M=data2.shape[0]
res = numpy.zeros([N,M]).astype(numpy.float32) # NxM matrix
#invoke kernel
dist_gpu = mod.get_function("distance")
dist_gpu(cuda.In(numpy.int32(N)), cuda.In(numpy.int32(M)), cuda.In(data1), cuda.In(data2), cuda.Out(res), block=(N,M,1))
#save data
numpy.savetxt("results.txt", res)
Compiling this I receive an error:
kernel.cu(3): error: a parameter is not allowed
that is, I cannot use M as the number of columns for res[][] in the declaretion of the function. I cannot either left the number of columns undeclared...
I need a matrix NxM as an output, but I can't figure out how to do this. Can you help me?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您应该在内核内部使用倾斜的线性内存访问,这就是 ndarray 和 gpuarray 在内部存储数据的方式,PyCUDA 将传递一个指向分配给 GPU 内存的数据的指针。
gpuarray
当它作为 PyCUDA 内核的参数提供时。所以(如果我理解你想要做什么)你的内核应该写成这样:这里我在定义 idx2d 帮助器时假设了 numpy 的默认行主要排序功能。您发布的代码的 Python 端仍然存在问题,但我想您已经知道了。
编辑:这是基于您问题中发布的代码的完整工作重现案例。请注意,它仅使用单个块(如原始块),因此在尝试在除微不足道的小情况之外的任何情况下运行它时,请注意块和网格尺寸。
You should use pitched linear memory access inside the kernel, that is how
ndarray
andgpuarray
store data internally, and PyCUDA will pass a pointer to the data in gpu memory allocated for agpuarray
when it is supplied as a argument to a PyCUDA kernel. So (if I understand what you are trying to do) your kernel should be written as something like:Here I have assumed the
numpy
default row major ordering in defining theidx2d
helper function. There are still problems with the Python side of the code you posted, but I guess you know that already.EDIT: Here is a complete working repro case based of the code posted in your question. Note that it only uses a single block (like the original), so be mindful of block and grid dimensions when trying to run it on anything other than trivially small cases.