共享内存的指针运算

发布于 2024-12-06 23:09:06 字数 1679 浏览 2 评论 0原文

我不明白以下几行到底发生了什么:

  1. unsigned char *membershipChanged = (unsigned char *)sharedMemory;
    
  2. float *clusters = (float *)(sharedMemory + blockDim.x);
    

我假设在#1中,sharedMemory有效地重命名为membershipChanged,但为什么要将blockDim添加到sharedMemory >指针。这个地址指向哪里?

sharedMemory 是用 extern __shared__ char sharedMemory[]; 创建的,


我在 CUDA kmeans 实现

void find_nearest_cluster(int numCoords,
                          int numObjs,
                          int numClusters,
                          float *objects,           //  [numCoords][numObjs]
                          float *deviceClusters,    //  [numCoords][numClusters]
                          int *membership,          //  [numObjs]
                          int *intermediates)
{
extern __shared__ char sharedMemory[];

//  The type chosen for membershipChanged must be large enough to support
//  reductions! There are blockDim.x elements, one for each thread in the
//  block.
unsigned char *membershipChanged = (unsigned char *)sharedMemory;
float *clusters = (float *)(sharedMemory + blockDim.x);

membershipChanged[threadIdx.x] = 0;

//  BEWARE: We can overrun our shared memory here if there are too many
//  clusters or too many coordinates!
for (int i = threadIdx.x; i < numClusters; i += blockDim.x) {
    for (int j = 0; j < numCoords; j++) {
        clusters[numClusters * j + i] = deviceClusters[numClusters * j + i];
    }
}
.....

I don't understand what exactly happens in the following lines:

  1. unsigned char *membershipChanged = (unsigned char *)sharedMemory;
    
  2. float *clusters = (float *)(sharedMemory + blockDim.x);
    

I assume that in #1 sharedMemory is effectively renamed into membershipChanged, but why would you add the blockDim to the sharedMemorypointer. Where does this address point?

sharedMemory was created with extern __shared__ char sharedMemory[];


The code I found in a CUDA kmeans implementation.

void find_nearest_cluster(int numCoords,
                          int numObjs,
                          int numClusters,
                          float *objects,           //  [numCoords][numObjs]
                          float *deviceClusters,    //  [numCoords][numClusters]
                          int *membership,          //  [numObjs]
                          int *intermediates)
{
extern __shared__ char sharedMemory[];

//  The type chosen for membershipChanged must be large enough to support
//  reductions! There are blockDim.x elements, one for each thread in the
//  block.
unsigned char *membershipChanged = (unsigned char *)sharedMemory;
float *clusters = (float *)(sharedMemory + blockDim.x);

membershipChanged[threadIdx.x] = 0;

//  BEWARE: We can overrun our shared memory here if there are too many
//  clusters or too many coordinates!
for (int i = threadIdx.x; i < numClusters; i += blockDim.x) {
    for (int j = 0; j < numCoords; j++) {
        clusters[numClusters * j + i] = deviceClusters[numClusters * j + i];
    }
}
.....

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

最美不过初阳 2024-12-13 23:09:06

sharedMemory + blockDim.x 指向远离共享内存区域基址的 blockDim.x 字节。

您可能会执行此类操作的原因是在共享内存中进行二次分配。内核的启动站点(包括find_nearest_cluster)为内核动态分配一定量的共享存储。该代码意味着两个逻辑上不同的数组驻留在 sharedMemory 指向的共享存储中——membershipChangedclusters。指针算术只是获取指向第二个数组的指针的一种方法。

sharedMemory + blockDim.x points blockDim.x bytes away from the base of the shared memory region.

The reason you might do something like this is to suballocate in shared memory. The launch site of the kernel which includes find_nearest_cluster dynamically allocates some amount of shared storage for the kernel. The code implies that two logically different arrays reside in the shared storage pointed to by sharedMemory -- membershipChanged, and clusters. The pointer arithmetic is simply a means to get a pointer to the second array.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文