传递给 CUDA 的结构中的指针
我已经搞砸了一段时间了,但似乎无法正确处理。我正在尝试将包含数组的对象复制到 CUDA 设备内存中(然后再次复制回来,但当我到达它时我会跨过那座桥):
struct MyData {
float *data;
int dataLen;
}
void copyToGPU() {
// Create dummy objects to copy
int N = 10;
MyData *h_items = new MyData[N];
for (int i=0; i<N; i++) {
h_items[i].dataLen = 100;
h_items[i].data = new float[100];
}
// Copy objects to GPU
MyData *d_items;
int memSize = N * sizeof(MyData);
cudaMalloc((void**)&d_items, memSize);
cudaMemCpy(d_items, h_items, memSize, cudaMemcpyHostToDevice);
// Run the kernel
MyFunc<<<100,100>>>(d_items);
}
__global__
static void MyFunc(MyData *data) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
for (int i=0; i<data[idx].dataLen; i++) {
// Do something with data[idx].data[i]
}
}
当我调用 MyFunc(d_items) 时,我可以访问 data[idx].dataLen很好。但是,data[idx].data 尚未被复制。
我无法使用 copyToGPU 中的 d_items.data 作为 cudaMalloc/cudaMemCpy 操作的目标,因为主机代码无法取消引用设备指针。
该怎么办?
I've been messing around with this for a while now, but can't seem to get it right. I'm trying to copy objects that contain arrays into CUDA device memory (and back again, but I'll cross that bridge when I come to it):
struct MyData {
float *data;
int dataLen;
}
void copyToGPU() {
// Create dummy objects to copy
int N = 10;
MyData *h_items = new MyData[N];
for (int i=0; i<N; i++) {
h_items[i].dataLen = 100;
h_items[i].data = new float[100];
}
// Copy objects to GPU
MyData *d_items;
int memSize = N * sizeof(MyData);
cudaMalloc((void**)&d_items, memSize);
cudaMemCpy(d_items, h_items, memSize, cudaMemcpyHostToDevice);
// Run the kernel
MyFunc<<<100,100>>>(d_items);
}
__global__
static void MyFunc(MyData *data) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
for (int i=0; i<data[idx].dataLen; i++) {
// Do something with data[idx].data[i]
}
}
When I call MyFunc(d_items), I can access data[idx].dataLen just fine. However, data[idx].data has not been copied yet.
I can't use d_items.data in copyToGPU as a destination for cudaMalloc/cudaMemCpy operations since the host code cannot dereference a device pointer.
What to do?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
结构,作为单个数组。
图形处理器。
示例:
structures, as a single array.
GPU.
example:
您提供的代码仅复制 MyData 结构:主机地址和一个整数。说得更清楚一点,您正在复制指针而不是数据 - 您必须显式复制数据。
如果数据总是相同的
LENGTH
,那么您可能只想制作一个大数组:如果它需要与其他数据放在一个结构中,那么:
但是,我假设您有这样的数据是各种长度的。一种解决方案是将 LENGTH 设置为最大长度(只是浪费一些空间),然后按照上面的方法进行操作。这可能是最简单的开始方法,然后您可以稍后进行优化。
如果您无法承受丢失的内存和传输时间,那么我将拥有三个数组,一个包含所有数据,一个包含偏移量,一个包含长度,对于主机和设备:
现在在线程
i< /code> 可以找到从
d_data[d_offsets[i]]
开始、长度为d_data[d_lengths[i]]
的数据The code you provide copies MyData structures only: a host address and a integer. To be overly clear, you are copying the pointer and not the data - you have to explicitly copy the data.
If the data is always the same
LENGTH
, then you probably just want to make one big array:If it needs to be in a struct with other data, then:
But, I am assuming you have data that is a variety of lengths. One solution is to set LENGTH to be the maximum length (and just waste some space), and then do it the same way as above. That might be the easiest way to start, and then you can optimize later.
If you can't afford the lost memory and transfer time, then I would have three arrays, one with all the data and then one with offsets and one with lengths, for both the host and device:
Now in thread
i
you can find the data that starts atd_data[d_offsets[i]]
and has a length ofd_data[d_lengths[i]]