如何将数据传递给 CUDA 中的共享变量?

发布于 2024-08-31 03:30:58 字数 1350 浏览 1 评论 0原文

我有一个传递3个数组的内核,第一个数组d_A1没有数据,仅用于写回数据,其他两个数组d_D1和d_ST1有数据。

第一个数组的大小为:

d_A1[13000000]

第二个数组的大小为:

d_D1[421]

第三个数组的大小为:

d_ST1[21]

N is 13000000

TestArray<<<n_blocks, block_size>>>(d_A1,N, d_D1, d_ST1);

现在我只想传递 d_D1[421] 和 d_ST1[21 的数据] 到共享数组,因此我将共享数组创建为:

__global__ void TestArray(int* A1, unsigned int N,  int* D1, unsigned int* ST1)
{

   unsigned int __align__(16) tid = threadIdx.x;
   unsigned int __align__(16) idx = __umul24(blockDim.x, blockIdx.x) + threadIdx.x;  
   __shared__ unsigned int __align__(16) s_D1[441];  //Shared array for d_D1
   __shared__ unsigned int __align__(16) s_ST1[21];  //Shared array for d_ST1

   if (idx < N)   //13000000

   {

问:如何将 d_D1[441] 和 d_ST1[21] 的数据传递到 s_D1[441] 和 s_ST1[21]? 我尝试过:

      while (idx < 441)

        s_D1[tid] = d_D1[idx] 

      __syncthreads(); 


      while (idx < 21)

        s_ST1[tid] = d_ST1[idx] 


      __syncthreads();  

但是计算机死机了,我必须重新启动它。 我当时也尝试过一个,就是只有拳头 一会儿,然后只有第二一会儿,没有运气。

如果我使用全局内存,即 d_D1、​​d_ST1 一切正常。 所以问题是: 当数组的大小不是 N 时,如何将数据传递给共享变量/数组?

   }   //End of kernel processing



}

I have a kernel which passes 3 arrays, the first array d_A1 has no data and is used only to write back data, the other two arrays d_D1 and d_ST1 have data.

The size of the first array is:

d_A1[13000000]

The size of the second array is:

d_D1[421]

The size of the third array is:

d_ST1[21]

N is 13000000

TestArray<<<n_blocks, block_size>>>(d_A1,N, d_D1, d_ST1);

Now I want only pass the data of d_D1[421] and d_ST1[21] to shared arrays so I created the shared arrays as:

__global__ void TestArray(int* A1, unsigned int N,  int* D1, unsigned int* ST1)
{

   unsigned int __align__(16) tid = threadIdx.x;
   unsigned int __align__(16) idx = __umul24(blockDim.x, blockIdx.x) + threadIdx.x;  
   __shared__ unsigned int __align__(16) s_D1[441];  //Shared array for d_D1
   __shared__ unsigned int __align__(16) s_ST1[21];  //Shared array for d_ST1

   if (idx < N)   //13000000

   {

Q. How do I pass the data of d_D1[441] and d_ST1[21] to s_D1[441] and s_ST1[21]?
I tried:

      while (idx < 441)

        s_D1[tid] = d_D1[idx] 

      __syncthreads(); 


      while (idx < 21)

        s_ST1[tid] = d_ST1[idx] 


      __syncthreads();  

but the computer freezes and I have to restart it.
I also tried one at the time,namely, only the fist
while and then only the second while, with no luck.

If I use the global memory, namely, d_D1, d_ST1 everything works.
So the question is:
How do you pass data to a shared variable/array when the size of the array is not N?

   }   //End of kernel processing



}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

昵称有卵用 2024-09-07 03:30:58

太好了,你是对的,它必须是 if (tid < 21) d_ST1[tid] = ST1[tid] 因为一个块足以满足这两个大小。确实非常感谢你。
但是,什么时候使用 if (idx < n) d_st1[tid] = st1[idx] ?

Great, you're right, it has to be if (tid < 21) d_ST1[tid] = ST1[tid] since one block is enough for those two size. Thanks you very much indeed.
However, when do you use if (idx < n) d_st1[tid] = st1[idx]?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文