pycuda索引numpy阵列的错误整数的错误

发布于 2025-02-13 02:41:56 字数 2047 浏览 1 评论 0原文

我正在将第一步转移到Pycuda上进行一些并行计算，并且遇到了一种我不了解的行为。我从可以在Pycuda官方网站上找到的非常基本的教程开始（一个简单的脚本，将数组的所有元素加倍 https://documen.tician.de/pycuda/tutorial.html ）。代码如下：

import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
a = numpy.random.randn(4,4)
a = a.astype(numpy.float32)
a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)
mod = SourceModule("""
  __global__ void doublify(float *a)
  {
    int idx = threadIdx.x + threadIdx.y*4;
    a[idx] *= 2;
  }
  """)
func = mod.get_function("doublify")
func(a_gpu, block=(4,4,1))
a_doubled = numpy.empty_like(a)
cuda.memcpy_dtoh(a_doubled, a_gpu)
print(a_doubled)
print(a)

非常清楚并且有效。结果是一个示例，

[[-1.9951048  -1.7537887  -1.3228793  -1.1585734 ]
 [-0.96863186 -1.7235669  -0.3331826  -1.1527038 ]
 [ 2.4142797  -0.35531005  1.8844942   3.996446  ]
 [ 1.400629   -2.7957075  -0.78042877  0.13829945]]
[[-0.9975524  -0.87689435 -0.66143966 -0.5792867 ]
 [-0.48431593 -0.86178344 -0.1665913  -0.5763519 ]
 [ 1.2071398  -0.17765503  0.9422471   1.998223  ]
 [ 0.7003145  -1.3978537  -0.39021438  0.06914973]]

但后来我试图修改代码以处理整数数字：

import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy

a = numpy.array([[1,2,3,4], [1,2,3,4], [1,2,3,4], [1,2,3,4]])

a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)
mod = SourceModule("""
  __global__ void doublify(int *a)
  {
    int idx = threadIdx.x + threadIdx.y*4;
    a[idx] *= 2;
  }
  """)
func = mod.get_function("doublify")
func(a_gpu, block=(4,4,1))
a_doubled = numpy.empty_like(a)
cuda.memcpy_dtoh(a_doubled, a_gpu)
print(a_doubled)
print(a)

...这是不起作用的。仅将2D阵列的一部分乘以2，其余的是没有变化的。结果：

[[2 4 6 8]
 [2 4 6 8]
 [1 2 3 4]
 [1 2 3 4]]
[[1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]]

为什么会发生这种情况？教程和修改的代码有什么区别？

感谢所有人！

原文

I am moving my first steps into PyCuda to perform some parallel computation and I came across a behavior I do not understand.
I started from the very basic tutorial that can be found on PyCuda official website (a simple script to double all elements of an array https://documen.tician.de/pycuda/tutorial.html). The code is the following:

import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
a = numpy.random.randn(4,4)
a = a.astype(numpy.float32)
a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)
mod = SourceModule("""
  __global__ void doublify(float *a)
  {
    int idx = threadIdx.x + threadIdx.y*4;
    a[idx] *= 2;
  }
  """)
func = mod.get_function("doublify")
func(a_gpu, block=(4,4,1))
a_doubled = numpy.empty_like(a)
cuda.memcpy_dtoh(a_doubled, a_gpu)
print(a_doubled)
print(a)

Is quite clear and it works. An example result is

[[-1.9951048  -1.7537887  -1.3228793  -1.1585734 ]
 [-0.96863186 -1.7235669  -0.3331826  -1.1527038 ]
 [ 2.4142797  -0.35531005  1.8844942   3.996446  ]
 [ 1.400629   -2.7957075  -0.78042877  0.13829945]]
[[-0.9975524  -0.87689435 -0.66143966 -0.5792867 ]
 [-0.48431593 -0.86178344 -0.1665913  -0.5763519 ]
 [ 1.2071398  -0.17765503  0.9422471   1.998223  ]
 [ 0.7003145  -1.3978537  -0.39021438  0.06914973]]

But then I tried to modify slightly the code to deal with integer numbers:

import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy

a = numpy.array([[1,2,3,4], [1,2,3,4], [1,2,3,4], [1,2,3,4]])

a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)
mod = SourceModule("""
  __global__ void doublify(int *a)
  {
    int idx = threadIdx.x + threadIdx.y*4;
    a[idx] *= 2;
  }
  """)
func = mod.get_function("doublify")
func(a_gpu, block=(4,4,1))
a_doubled = numpy.empty_like(a)
cuda.memcpy_dtoh(a_doubled, a_gpu)
print(a_doubled)
print(a)

... and this does not work. Only a part of the 2d array is multiplied by 2, the rest is unchanged. Example result:

[[2 4 6 8]
 [2 4 6 8]
 [1 2 3 4]
 [1 2 3 4]]
[[1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]]

Why is this happening? What is the difference between the tutorial and the modified code?

Thanks to all!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

新一帅帅 2025-02-20 02:41:56

好的，我有点解决了使用浮点类型的待，即使我需要与整数一起工作。显然，在为整数分配内存时，存在一些幕后机制，这与Pycuda不符。

回复收藏 0 原文

~没有更多了~

关于作者

不知在何时

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

pycuda索引numpy阵列的错误整数的错误

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

pycuda索引numpy阵列的错误整数的错误

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。