CUDA在设备上静态分配数据
我一直在尝试分配一个可由每个内核函数访问的变量。 我的尝试是下面附加的代码,但它不会编译,因为内核无法查看和访问 dArray。在 C++ 中,您可以将变量放在顶部或声明 static 以在整个程序的每个范围内访问。
__global__ void StoreThreadNumber()
{
dArray[threadIdx.x] = threadIdx.x;
}
int main( int argc, char** argv)
{
unsigned __int8 Array[16] = { 0 };
unsigned __int8 dArray[16];
for( __int8 Position = 0; Position < 16; Position++)
cout << Array[Position] << " ";
cout << endl;
cudaMalloc((void**) dArray, 16*sizeof(__int8));
cudaMemcpy( dArray, Array, 16*sizeof(__int8), cudaMemcpyHostToDevice);
StoreThreadNumber<<<1, 16>>>();
cudaMemcpy( Array, dArray, 16*sizeof(__int8), cudaMemcpyDeviceToHost);
for( __int8 Position = 0; Position < 16; Position++)
cout << Array[Position] << " ";
cout << endl;
cudaFree(dArray);
}
I've been trying to allocate a variable that can be accessed by each kernel function.
My attempt is the code attached below, but it won't compile cause the dArray can't be viewed accessed by the kernel. In C++ you would place the variable at the top or declare static to be accessed in every scope through out the program.
__global__ void StoreThreadNumber()
{
dArray[threadIdx.x] = threadIdx.x;
}
int main( int argc, char** argv)
{
unsigned __int8 Array[16] = { 0 };
unsigned __int8 dArray[16];
for( __int8 Position = 0; Position < 16; Position++)
cout << Array[Position] << " ";
cout << endl;
cudaMalloc((void**) dArray, 16*sizeof(__int8));
cudaMemcpy( dArray, Array, 16*sizeof(__int8), cudaMemcpyHostToDevice);
StoreThreadNumber<<<1, 16>>>();
cudaMemcpy( Array, dArray, 16*sizeof(__int8), cudaMemcpyDeviceToHost);
for( __int8 Position = 0; Position < 16; Position++)
cout << Array[Position] << " ";
cout << endl;
cudaFree(dArray);
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以在 CUDA 中使用
__device__
或__constant__
类型的全局变量。因此,例如,如果您使用cudaMemcpyToSymbol()
将__constant__
指针变量初始化为设备指针的地址,则可以通过__constant__ 访问该指针
变量:只需确保在运行内核之前从主机代码正确初始化 dArrayPtr 即可。
You can have global variables in CUDA, of type
__device__
or__constant__
. So, for example, if you initialize a__constant__
pointer variable to the address of a device pointer usingcudaMemcpyToSymbol()
, you can then access that pointer via the__constant__
variable:Just make sure you correctly initialize dArrayPtr from your host code before you run the kernel.
你不能。您必须将 dArray 的指针传递给内核。
我遇到了同样的问题,必须将大量全局数据传递到 GPU。我最终将其全部包装在一个结构中,并传递一个指向它的指针。
You can't. You have to pass a pointer to dArray to the kernel.
I had the same problem having to pass along a lot of global data to the gpu. I ended up wrapping it all up in a struct and passing around a pointer to it.