CUDA程序赋予垃圾价值

发布于 2024-11-29 12:28:17 字数 650 浏览 0 评论 0原文

我真的不明白为什么下面代码的输出不是a和b。

#include<cutil.h>
#include<iostream>
__global__ void p(unsigned char **a){


unsigned char temp[2];
temp[0] = 'a';
temp[1] = 'b';
a[0] = temp;


}

void main(){

    unsigned char **a ;
    cudaMalloc((void**)&a, sizeof(unsigned char*));
    p<<<1,1>>>(a);
    unsigned char **c;
    unsigned char b[2];
    cudaMemcpy(c, a, sizeof(unsigned char *), cudaMemcpyDeviceToHost);
    cudaMemcpy(b, c[0], 2*sizeof(unsigned char), cudaMemcpyDeviceToHost);
    for( int i=0 ; i < 2; i++){
        printf("%c\n", b[i]);
    }


    getchar();


}

我的逻辑有什么问题吗？

原文

I really do not understand why the output for the below code is not a and b.

#include<cutil.h>
#include<iostream>
__global__ void p(unsigned char **a){


unsigned char temp[2];
temp[0] = 'a';
temp[1] = 'b';
a[0] = temp;


}

void main(){

    unsigned char **a ;
    cudaMalloc((void**)&a, sizeof(unsigned char*));
    p<<<1,1>>>(a);
    unsigned char **c;
    unsigned char b[2];
    cudaMemcpy(c, a, sizeof(unsigned char *), cudaMemcpyDeviceToHost);
    cudaMemcpy(b, c[0], 2*sizeof(unsigned char), cudaMemcpyDeviceToHost);
    for( int i=0 ; i < 2; i++){
        printf("%c\n", b[i]);
    }


    getchar();


}

what is wrong with my logic?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

千纸鹤 2024-12-06 12:28:17

我们暂时先不考虑 CUDA。让我们创建一个将数据写入用户提供的数组的函数。用户通过指针传递数组：

void fill_me_up(int * dst)
{
  // We sure hope that `dst` points to a large enough area of memory!

  dst[0] = 28;
  dst[1] = 75;
}

现在，您对局部变量所做的事情没有意义，因为您想要使用局部变量的地址，而在离开函数作用域后该地址将变得无效。您可以做的下一个最好的事情是 memcpy() 或某种等效的 C++ 算法：

void fill_me_up_again(int * dst)
{
  int temp[] = { 28, 75 };
  memcpy((void *)dst, (const void *)temp, sizeof(temp));
}

好的，现在开始调用该函数：我们首先必须提供目标内存，并且然后传递一个指针：（

int main()
{
  int my_memory[2]; // here's our memory -- automatic local storage

  fill_me_up(my_memory);     // OK, array decays to pointer-to-beginning
  fill_me_up(&my_memory[0]); // A bit more explicit

  int * your_memory = malloc(sizeof(int) * 2); // more memory, this time dynamic
  fill_me_up_again(your_memory);
  /* ... */
  free(your_memory);
}

在 C++ 中，您可能会使用 new int[2] 和 delete your_memory 来代替，但是通过使用 C malloc() > 与 CUDA 的连接有望成为清楚。）

当您将 fill_me_up 移动到 CUDA 设备时，您必须为其提供一个设备指针而不是主机指针，因此您必须先设置该指针，然后将结果复制回来出来，但这是唯一的变化。

Let's leave out CUDA for now. Let's just make a function that writes data to a user-provided array. The user passes the array via a pointer:

void fill_me_up(int * dst)
{
  // We sure hope that `dst` points to a large enough area of memory!

  dst[0] = 28;
  dst[1] = 75;
}

Now, what you're doing with the local variable doesn't make sense, because you want to use the address of a local variable, which becomes invalid after you leave the function scope. The next best thing you could do is memcpy(), or some equivalent C++ algorithm:

void fill_me_up_again(int * dst)
{
  int temp[] = { 28, 75 };
  memcpy((void *)dst, (const void *)temp, sizeof(temp));
}

OK, now on to calling that function: We first must provide the target memory, and then pass a pointer:

int main()
{
  int my_memory[2]; // here's our memory -- automatic local storage

  fill_me_up(my_memory);     // OK, array decays to pointer-to-beginning
  fill_me_up(&my_memory[0]); // A bit more explicit

  int * your_memory = malloc(sizeof(int) * 2); // more memory, this time dynamic
  fill_me_up_again(your_memory);
  /* ... */
  free(your_memory);
}

(In C++ you would probably have uses new int[2] and delete your_memory instead, but by using C malloc() the connection to CUDA hopefully becomes clear.)

When you're moving fill_me_up to the CUDA device, you have to give it a device pointer rather than a host pointer, so you have to set that one up first and afterwards copy the results back out, but that's about the only change.

回复收藏 0 原文

~没有更多了~