Pycuda 搞乱了 numpy 矩阵转置
为什么转置矩阵在转换为 pycuda.gpuarray 时看起来有所不同?
你能重现这个吗?什么可能导致这种情况?我使用了错误的方法吗?
示例代码
from pycuda import gpuarray
import pycuda.autoinit
import numpy
data = numpy.random.randn(2,4).astype(numpy.float32)
data_gpu = gpuarray.to_gpu(data.T)
print "data\n",data
print "data_gpu.get()\n",data_gpu.get()
print "data.T\n",data.T
输出
data
[[ 0.70442784 0.08845157 -0.84840715 -1.81618035]
[ 0.55292499 0.54911566 0.54672164 0.05098847]]
data_gpu.get()
[[ 0.70442784 0.08845157]
[-0.84840715 -1.81618035]
[ 0.55292499 0.54911566]
[ 0.54672164 0.05098847]]
data.T
[[ 0.70442784 0.55292499]
[ 0.08845157 0.54911566]
[-0.84840715 0.54672164]
[-1.81618035 0.05098847]]
Why does the transposed matrix look differently, when converted to a pycuda.gpuarray
?
Can you reproduce this? What could cause this? Am I using the wrong approach?
Example code
from pycuda import gpuarray
import pycuda.autoinit
import numpy
data = numpy.random.randn(2,4).astype(numpy.float32)
data_gpu = gpuarray.to_gpu(data.T)
print "data\n",data
print "data_gpu.get()\n",data_gpu.get()
print "data.T\n",data.T
Output
data
[[ 0.70442784 0.08845157 -0.84840715 -1.81618035]
[ 0.55292499 0.54911566 0.54672164 0.05098847]]
data_gpu.get()
[[ 0.70442784 0.08845157]
[-0.84840715 -1.81618035]
[ 0.55292499 0.54911566]
[ 0.54672164 0.05098847]]
data.T
[[ 0.70442784 0.55292499]
[ 0.08845157 0.54911566]
[-0.84840715 0.54672164]
[-1.81618035 0.05098847]]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
根本原因是 numpy transpose 仅创建一个视图,这对底层数组存储没有影响,并且是 PyCUDA 在对设备内存执行复制时直接访问的存储。解决方案是在进行转置时使用 copy 方法,该方法将在主机内存中创建一个包含转置顺序数据的数组,然后将其复制到设备:
The basic reason is that numpy transpose only creates a view, which has no effect on the underlying array storage, and it is that storage which PyCUDA directly accesses when a copy is performed to device memory. The solution is to use the
copy
method when doing the transpose, which will create an array with data in the transposed order in host memory, then copy that to the device:在 numpy 中,
data.T
不会对底层一维数组执行任何操作。它只是简单地操纵步幅来获得转置。这使其成为恒定时间和恒定内存操作。看起来 pycuda.to_gpu() 不尊重步幅,只是复制底层的一维数组。这将产生您所观察到的确切行为。
在我看来,你的代码没有任何问题。相反,我认为这是 pycuda 中的一个错误。
我用谷歌搜索了一下,发现 详细讨论此问题的线程。
作为解决方法,您可以尝试将
numpy.ascontigouslyarray(data.T)
传递给gpuarray.to_gpu()
。当然,这将在主机 RAM 中创建数据的第二个副本。In numpy,
data.T
doesn't do anything to the underlying 1D array. It simply manipulates the strides to obtain the transpose. This makes it a constant-time and constant-memory operation.It would appear that
pycuda.to_gpu()
isn't respecting the strides and is simply copying the underlying 1D array. This would produce the exact behaviour you're observing.In my view there is nothing wrong with your code. Rather, I would consider this a bug in
pycuda
.I've googled around, and have found a thread that discusses this issue in detail.
As a workaround, you could try passing
numpy.ascontiguousarray(data.T)
togpuarray.to_gpu()
. This will, of course, create a second copy of the data in the host RAM.