Pycuda 搞乱了 numpy 矩阵转置

发布于 2024-11-27 08:33:26 字数 780 浏览 4 评论 0原文

为什么转置矩阵在转换为 pycuda.gpuarray 时看起来有所不同？

你能重现这个吗？什么可能导致这种情况？我使用了错误的方法吗？

示例代码

from pycuda import gpuarray
import pycuda.autoinit
import numpy

data = numpy.random.randn(2,4).astype(numpy.float32)
data_gpu = gpuarray.to_gpu(data.T)
print "data\n",data
print "data_gpu.get()\n",data_gpu.get()
print "data.T\n",data.T

输出

data
[[ 0.70442784  0.08845157 -0.84840715 -1.81618035]
 [ 0.55292499  0.54911566  0.54672164  0.05098847]]
data_gpu.get()
[[ 0.70442784  0.08845157]
 [-0.84840715 -1.81618035]
 [ 0.55292499  0.54911566]
 [ 0.54672164  0.05098847]]
data.T
[[ 0.70442784  0.55292499]
 [ 0.08845157  0.54911566]
 [-0.84840715  0.54672164]
 [-1.81618035  0.05098847]]

原文

Why does the transposed matrix look differently, when converted to a pycuda.gpuarray?

Can you reproduce this? What could cause this? Am I using the wrong approach?

Example code

from pycuda import gpuarray
import pycuda.autoinit
import numpy

data = numpy.random.randn(2,4).astype(numpy.float32)
data_gpu = gpuarray.to_gpu(data.T)
print "data\n",data
print "data_gpu.get()\n",data_gpu.get()
print "data.T\n",data.T

Output

data
[[ 0.70442784  0.08845157 -0.84840715 -1.81618035]
 [ 0.55292499  0.54911566  0.54672164  0.05098847]]
data_gpu.get()
[[ 0.70442784  0.08845157]
 [-0.84840715 -1.81618035]
 [ 0.55292499  0.54911566]
 [ 0.54672164  0.05098847]]
data.T
[[ 0.70442784  0.55292499]
 [ 0.08845157  0.54911566]
 [-0.84840715  0.54672164]
 [-1.81618035  0.05098847]]

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

养猫人 2024-12-04 08:33:26

根本原因是 numpy transpose 仅创建一个视图，这对底层数组存储没有影响，并且是 PyCUDA 在对设备内存执行复制时直接访问的存储。解决方案是在进行转置时使用 copy 方法，该方法将在主机内存中创建一个包含转置顺序数据的数组，然后将其复制到设备：

data_gpu = gpuarray.to_gpu(data.T.copy())

The basic reason is that numpy transpose only creates a view, which has no effect on the underlying array storage, and it is that storage which PyCUDA directly accesses when a copy is performed to device memory. The solution is to use the copy method when doing the transpose, which will create an array with data in the transposed order in host memory, then copy that to the device:

data_gpu = gpuarray.to_gpu(data.T.copy())

回复收藏 0 原文

夏末 2024-12-04 08:33:26

在 numpy 中，data.T 不会对底层一维数组执行任何操作。它只是简单地操纵步幅来获得转置。这使其成为恒定时间和恒定内存操作。

看起来 pycuda.to_gpu() 不尊重步幅，只是复制底层的一维数组。这将产生您所观察到的确切行为。

在我看来，你的代码没有任何问题。相反，我认为这是 pycuda 中的一个错误。

我用谷歌搜索了一下，发现详细讨论此问题的线程。

作为解决方法，您可以尝试将 numpy.ascontigouslyarray(data.T) 传递给 gpuarray.to_gpu()。当然，这将在主机 RAM 中创建数据的第二个副本。

回复收藏 0 原文

~没有更多了~

关于作者

往昔成烟

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

Pycuda 搞乱了 numpy 矩阵转置

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

Pycuda 搞乱了 numpy 矩阵转置

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。