pycuda索引numpy阵列的错误整数的错误

发布于 2025-02-13 02:41:56 字数 2047 浏览 1 评论 0原文

我正在将第一步转移到Pycuda上进行一些并行计算,并且遇到了一种我不了解的行为。 我从可以在Pycuda官方网站上找到的非常基本的教程开始(一个简单的脚本,将数组的所有元素加倍 https://documen.tician.de/pycuda/tutorial.html )。代码如下:

import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
a = numpy.random.randn(4,4)
a = a.astype(numpy.float32)
a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)
mod = SourceModule("""
  __global__ void doublify(float *a)
  {
    int idx = threadIdx.x + threadIdx.y*4;
    a[idx] *= 2;
  }
  """)
func = mod.get_function("doublify")
func(a_gpu, block=(4,4,1))
a_doubled = numpy.empty_like(a)
cuda.memcpy_dtoh(a_doubled, a_gpu)
print(a_doubled)
print(a)

非常清楚并且有效。结果是一个示例,

[[-1.9951048  -1.7537887  -1.3228793  -1.1585734 ]
 [-0.96863186 -1.7235669  -0.3331826  -1.1527038 ]
 [ 2.4142797  -0.35531005  1.8844942   3.996446  ]
 [ 1.400629   -2.7957075  -0.78042877  0.13829945]]
[[-0.9975524  -0.87689435 -0.66143966 -0.5792867 ]
 [-0.48431593 -0.86178344 -0.1665913  -0.5763519 ]
 [ 1.2071398  -0.17765503  0.9422471   1.998223  ]
 [ 0.7003145  -1.3978537  -0.39021438  0.06914973]]

但后来我试图修改代码以处理整数数字:

import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy

a = numpy.array([[1,2,3,4], [1,2,3,4], [1,2,3,4], [1,2,3,4]])

a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)
mod = SourceModule("""
  __global__ void doublify(int *a)
  {
    int idx = threadIdx.x + threadIdx.y*4;
    a[idx] *= 2;
  }
  """)
func = mod.get_function("doublify")
func(a_gpu, block=(4,4,1))
a_doubled = numpy.empty_like(a)
cuda.memcpy_dtoh(a_doubled, a_gpu)
print(a_doubled)
print(a)

...这是不起作用的。仅将2D阵列的一部分乘以2,其余的是没有变化的。结果:

[[2 4 6 8]
 [2 4 6 8]
 [1 2 3 4]
 [1 2 3 4]]
[[1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]]

为什么会发生这种情况?教程和修改的代码有什么区别?

感谢所有人!

I am moving my first steps into PyCuda to perform some parallel computation and I came across a behavior I do not understand.
I started from the very basic tutorial that can be found on PyCuda official website (a simple script to double all elements of an array https://documen.tician.de/pycuda/tutorial.html). The code is the following:

import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
a = numpy.random.randn(4,4)
a = a.astype(numpy.float32)
a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)
mod = SourceModule("""
  __global__ void doublify(float *a)
  {
    int idx = threadIdx.x + threadIdx.y*4;
    a[idx] *= 2;
  }
  """)
func = mod.get_function("doublify")
func(a_gpu, block=(4,4,1))
a_doubled = numpy.empty_like(a)
cuda.memcpy_dtoh(a_doubled, a_gpu)
print(a_doubled)
print(a)

Is quite clear and it works. An example result is

[[-1.9951048  -1.7537887  -1.3228793  -1.1585734 ]
 [-0.96863186 -1.7235669  -0.3331826  -1.1527038 ]
 [ 2.4142797  -0.35531005  1.8844942   3.996446  ]
 [ 1.400629   -2.7957075  -0.78042877  0.13829945]]
[[-0.9975524  -0.87689435 -0.66143966 -0.5792867 ]
 [-0.48431593 -0.86178344 -0.1665913  -0.5763519 ]
 [ 1.2071398  -0.17765503  0.9422471   1.998223  ]
 [ 0.7003145  -1.3978537  -0.39021438  0.06914973]]

But then I tried to modify slightly the code to deal with integer numbers:

import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy

a = numpy.array([[1,2,3,4], [1,2,3,4], [1,2,3,4], [1,2,3,4]])

a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)
mod = SourceModule("""
  __global__ void doublify(int *a)
  {
    int idx = threadIdx.x + threadIdx.y*4;
    a[idx] *= 2;
  }
  """)
func = mod.get_function("doublify")
func(a_gpu, block=(4,4,1))
a_doubled = numpy.empty_like(a)
cuda.memcpy_dtoh(a_doubled, a_gpu)
print(a_doubled)
print(a)

... and this does not work. Only a part of the 2d array is multiplied by 2, the rest is unchanged. Example result:

[[2 4 6 8]
 [2 4 6 8]
 [1 2 3 4]
 [1 2 3 4]]
[[1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]]

Why is this happening? What is the difference between the tutorial and the modified code?

Thanks to all!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

新一帅帅 2025-02-20 02:41:56

好的,我有点解决了使用浮点类型的待,即使我需要与整数一起工作。显然,在为整数分配内存时,存在一些幕后机制,这与Pycuda不符。

OK so I kinda solved staying with float type, even though I need to work with integers. Apparently there are some behind-the-scene mechanism when allocating memory for integers and this does not fit with PyCuda.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文