Pycuda 搞乱了 numpy 矩阵转置

发布于 2024-11-27 08:33:26 字数 780 浏览 4 评论 0原文

为什么转置矩阵在转换为 pycuda.gpuarray 时看起来有所不同?

你能重现这个吗?什么可能导致这种情况?我使用了错误的方法吗?

示例代码

from pycuda import gpuarray
import pycuda.autoinit
import numpy

data = numpy.random.randn(2,4).astype(numpy.float32)
data_gpu = gpuarray.to_gpu(data.T)
print "data\n",data
print "data_gpu.get()\n",data_gpu.get()
print "data.T\n",data.T

输出

data
[[ 0.70442784  0.08845157 -0.84840715 -1.81618035]
 [ 0.55292499  0.54911566  0.54672164  0.05098847]]
data_gpu.get()
[[ 0.70442784  0.08845157]
 [-0.84840715 -1.81618035]
 [ 0.55292499  0.54911566]
 [ 0.54672164  0.05098847]]
data.T
[[ 0.70442784  0.55292499]
 [ 0.08845157  0.54911566]
 [-0.84840715  0.54672164]
 [-1.81618035  0.05098847]]

Why does the transposed matrix look differently, when converted to a pycuda.gpuarray?

Can you reproduce this? What could cause this? Am I using the wrong approach?

Example code

from pycuda import gpuarray
import pycuda.autoinit
import numpy

data = numpy.random.randn(2,4).astype(numpy.float32)
data_gpu = gpuarray.to_gpu(data.T)
print "data\n",data
print "data_gpu.get()\n",data_gpu.get()
print "data.T\n",data.T

Output

data
[[ 0.70442784  0.08845157 -0.84840715 -1.81618035]
 [ 0.55292499  0.54911566  0.54672164  0.05098847]]
data_gpu.get()
[[ 0.70442784  0.08845157]
 [-0.84840715 -1.81618035]
 [ 0.55292499  0.54911566]
 [ 0.54672164  0.05098847]]
data.T
[[ 0.70442784  0.55292499]
 [ 0.08845157  0.54911566]
 [-0.84840715  0.54672164]
 [-1.81618035  0.05098847]]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

养猫人 2024-12-04 08:33:26

根本原因是 numpy transpose 仅​​创建一个视图,这对底层数组存储没有影响,并且是 PyCUDA 在对设备内存执行复制时直接访问的存储。解决方案是在进行转置时使用 copy 方法,该方法将在主机内存中创建一个包含转置顺序数据的数组,然后将其复制到设备:

data_gpu = gpuarray.to_gpu(data.T.copy())

The basic reason is that numpy transpose only creates a view, which has no effect on the underlying array storage, and it is that storage which PyCUDA directly accesses when a copy is performed to device memory. The solution is to use the copy method when doing the transpose, which will create an array with data in the transposed order in host memory, then copy that to the device:

data_gpu = gpuarray.to_gpu(data.T.copy())
夏末 2024-12-04 08:33:26

在 numpy 中,data.T 不会对底层一维数组执行任何操作。它只是简单地操纵步幅来获得转置。这使其成为恒定时间和恒定内存操作。

看起来 pycuda.to_gpu() 不尊重步幅,只是复制底层的一维数组。这将产生您所观察到的确切行为。

在我看来,你的代码没有任何问题。相反,我认为这是 pycuda 中的一个错误。

我用谷歌搜索了一下,发现 详细讨论此问题的线程

作为解决方法,您可以尝试将 numpy.ascontigouslyarray(data.T) 传递给 gpuarray.to_gpu()。当然,这将在主机 RAM 中创建数据的第二个副本。

In numpy, data.T doesn't do anything to the underlying 1D array. It simply manipulates the strides to obtain the transpose. This makes it a constant-time and constant-memory operation.

It would appear that pycuda.to_gpu() isn't respecting the strides and is simply copying the underlying 1D array. This would produce the exact behaviour you're observing.

In my view there is nothing wrong with your code. Rather, I would consider this a bug in pycuda.

I've googled around, and have found a thread that discusses this issue in detail.

As a workaround, you could try passing numpy.ascontiguousarray(data.T) to gpuarray.to_gpu(). This will, of course, create a second copy of the data in the host RAM.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文