共享内存的指针运算
我不明白以下几行到底发生了什么:
unsigned char *membershipChanged = (unsigned char *)sharedMemory;
float *clusters = (float *)(sharedMemory + blockDim.x);
我假设在#1中,sharedMemory
有效地重命名为membershipChanged
,但为什么要将blockDim
添加到sharedMemory
>指针。这个地址指向哪里?
sharedMemory
是用 extern __shared__ char sharedMemory[];
创建的,
我在 CUDA kmeans 实现。
void find_nearest_cluster(int numCoords,
int numObjs,
int numClusters,
float *objects, // [numCoords][numObjs]
float *deviceClusters, // [numCoords][numClusters]
int *membership, // [numObjs]
int *intermediates)
{
extern __shared__ char sharedMemory[];
// The type chosen for membershipChanged must be large enough to support
// reductions! There are blockDim.x elements, one for each thread in the
// block.
unsigned char *membershipChanged = (unsigned char *)sharedMemory;
float *clusters = (float *)(sharedMemory + blockDim.x);
membershipChanged[threadIdx.x] = 0;
// BEWARE: We can overrun our shared memory here if there are too many
// clusters or too many coordinates!
for (int i = threadIdx.x; i < numClusters; i += blockDim.x) {
for (int j = 0; j < numCoords; j++) {
clusters[numClusters * j + i] = deviceClusters[numClusters * j + i];
}
}
.....
I don't understand what exactly happens in the following lines:
unsigned char *membershipChanged = (unsigned char *)sharedMemory;
float *clusters = (float *)(sharedMemory + blockDim.x);
I assume that in #1 sharedMemory
is effectively renamed into membershipChanged
, but why would you add the blockDim
to the sharedMemory
pointer. Where does this address point?
sharedMemory
was created with extern __shared__ char sharedMemory[];
The code I found in a CUDA kmeans implementation.
void find_nearest_cluster(int numCoords,
int numObjs,
int numClusters,
float *objects, // [numCoords][numObjs]
float *deviceClusters, // [numCoords][numClusters]
int *membership, // [numObjs]
int *intermediates)
{
extern __shared__ char sharedMemory[];
// The type chosen for membershipChanged must be large enough to support
// reductions! There are blockDim.x elements, one for each thread in the
// block.
unsigned char *membershipChanged = (unsigned char *)sharedMemory;
float *clusters = (float *)(sharedMemory + blockDim.x);
membershipChanged[threadIdx.x] = 0;
// BEWARE: We can overrun our shared memory here if there are too many
// clusters or too many coordinates!
for (int i = threadIdx.x; i < numClusters; i += blockDim.x) {
for (int j = 0; j < numCoords; j++) {
clusters[numClusters * j + i] = deviceClusters[numClusters * j + i];
}
}
.....
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
sharedMemory + blockDim.x
指向远离共享内存区域基址的blockDim.x
字节。您可能会执行此类操作的原因是在共享内存中进行二次分配。内核的启动站点(包括find_nearest_cluster)为内核动态分配一定量的共享存储。该代码意味着两个逻辑上不同的数组驻留在
sharedMemory
指向的共享存储中——membershipChanged
和clusters
。指针算术只是获取指向第二个数组的指针的一种方法。sharedMemory + blockDim.x
pointsblockDim.x
bytes away from the base of the shared memory region.The reason you might do something like this is to suballocate in shared memory. The launch site of the kernel which includes
find_nearest_cluster
dynamically allocates some amount of shared storage for the kernel. The code implies that two logically different arrays reside in the shared storage pointed to bysharedMemory
--membershipChanged
, andclusters
. The pointer arithmetic is simply a means to get a pointer to the second array.