优化基质图块的GPU分配/转移
我正在使用非常大的矩阵(> 1GB)工作,但想象我有以下矩阵:
A = [1 1 2 2;
1 1 2 2;
3 3 4 4;
3 3 4 4]
我需要以异步的方式固定上一个矩阵的每个瓷砖以将其传输到GPU(使用cuda.jl package)。
以下代码分配了GPU中每个瓷砖的空间,并且正在工作:
function allocGPU!(gpu_buf, m,n)
dev_buf = CUDA.Mem.alloc(CUDA.Mem.DeviceBuffer, m*n*8)
dev_ptr = convert(CuPtr{Float64}, dev_buf);
push!(gpu_buf, dev_buf)
tile_gpu = unsafe_wrap(CuArray{Float64}, dev_ptr, (m,n));
gpu_buf
return tile_gpu
end
A_coor = [(1:2,1:2) (1:2, 3:4);
(3:4,1:2) (3:4,3:4)]
A_tiles = [A[A_coor[i][1], A_coor[i,j][2]] for i=1:size(A_coor)[1], j=1:size(A_coor)[2]]
gpu_buf = []
A_tiles_gpu = [allocGPU!(gpu_buf, m,n) for i=1:size(A_tiles)[1], j=1:size(A_tiles)[2]]
但是它将每个瓷砖复制到一个新对象中,花费时间比我想要的更多。是否有任何方法可以将2x2阵列包裹到每个瓷砖以减少分配数量?
我还尝试了这条线:
A_tiles = [unsafe_wrap(Array{Float64}, pointer(A[A_coor[i][1], A_coor[i,j][2]]), (m,n)) for i=1:size(A_coor)[1], j=1:size(A_coor)[2]]
我也将固定矩阵A固定,然后将
copyto!(tile_gpu, A[1:2,1:2])
其 转移到GPU时: ,产生与第一个方法相同的结果。
编辑:
当我怀疑:
copyto!(tile_gpu, A[1:2,1:2])
在不同的内存位置创建一个新对象时,我还尝试使用@view
宏,尽管它适用于CPU,但它似乎与<<<代码> copyto!到GPU内存。
I am working with very large matrices (>1GB) but imagine that I have the following matrix:
A = [1 1 2 2;
1 1 2 2;
3 3 4 4;
3 3 4 4]
I need to pin each tile of the previous matrix to transfer them to the GPU in an async way (using the CUDA.jl package).
The following code allocates the space of each tile in the GPU and it is working:
function allocGPU!(gpu_buf, m,n)
dev_buf = CUDA.Mem.alloc(CUDA.Mem.DeviceBuffer, m*n*8)
dev_ptr = convert(CuPtr{Float64}, dev_buf);
push!(gpu_buf, dev_buf)
tile_gpu = unsafe_wrap(CuArray{Float64}, dev_ptr, (m,n));
gpu_buf
return tile_gpu
end
A_coor = [(1:2,1:2) (1:2, 3:4);
(3:4,1:2) (3:4,3:4)]
A_tiles = [A[A_coor[i][1], A_coor[i,j][2]] for i=1:size(A_coor)[1], j=1:size(A_coor)[2]]
gpu_buf = []
A_tiles_gpu = [allocGPU!(gpu_buf, m,n) for i=1:size(A_tiles)[1], j=1:size(A_tiles)[2]]
But it's copying each tile into a new object, taking more time than I would like. Is there any way to wrap a 2x2 Array to each tile in order to reduce the number of allocations?
I also tried with this line:
A_tiles = [unsafe_wrap(Array{Float64}, pointer(A[A_coor[i][1], A_coor[i,j][2]]), (m,n)) for i=1:size(A_coor)[1], j=1:size(A_coor)[2]]
I also though of pinning matrix A and then transfer to the GPU as:
copyto!(tile_gpu, A[1:2,1:2])
but I'm guessing julia will copy the A[1:2,1:2] into a new object and then transfer the tile, yielding the same results as 1st method.
Edit:
As I suspected the:
copyto!(tile_gpu, A[1:2,1:2])
Creates a new object, in a different memory location, I also tried to use the @view
macro, although it works for the CPU it doesn't seem to work with copyto!
to the GPU memory.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论