CUDA内存问题
我有一个 CUDA 内核,我正在将其编译为 cubin 文件,没有任何特殊标志:
nvcc text.cu -cubin
它可以编译,但会显示以下消息:
建议:无法判断指针指向什么,假设全局内存空间
和对某些临时 cpp 文件中的行的引用。 我可以通过注释掉一些看似随意的代码来实现这一点,这些代码对我来说毫无意义。
内核如下:
__global__ void string_search(char** texts, int* lengths, char* symbol, int* matches, int symbolLength)
{
int localMatches = 0;
int blockId = blockIdx.x + blockIdx.y * gridDim.x;
int threadId = threadIdx.x + threadIdx.y * blockDim.x;
int blockThreads = blockDim.x * blockDim.y;
__shared__ int localMatchCounts[32];
bool breaking = false;
for(int i = 0; i < (lengths[blockId] - (symbolLength - 1)); i += blockThreads)
{
if(texts[blockId][i] == symbol[0])
{
for(int j = 1; j < symbolLength; j++)
{
if(texts[blockId][i + j] != symbol[j])
{
breaking = true;
break;
}
}
if (breaking) continue;
localMatches++;
}
}
localMatchCounts[threadId] = localMatches;
__syncthreads();
if(threadId == 0)
{
int sum = 0;
for(int i = 0; i < 32; i++)
{
sum += localMatchCounts[i];
}
matches[blockId] = sum;
}
}
行,
localMatchCounts[threadId] = localMatches;
如果我用这行替换第一个 for 循环之后的
localMatchCounts[threadId] = 5;
它将在没有任何通知的情况下进行编译。 这也可以通过注释掉该行上方循环中看似随机的部分来实现。 我也尝试用普通数组替换本地内存数组,但没有效果。 谁能告诉我问题是什么?
就其价值而言,系统是 Vista 64 位。
编辑:我修复了代码,因此它实际上可以工作,尽管它仍然会产生编译器通知。 警告似乎不是一个问题,至少在正确性方面(它可能会影响性能)。
I have a CUDA kernel which I'm compiling to a cubin file without any special flags:
nvcc text.cu -cubin
It compiles, though with this message:
Advisory: Cannot tell what pointer points to, assuming global memory space
and a reference to a line in some temporary cpp file. I can get this to work by commenting out some seemingly arbitrary code which makes no sense to me.
The kernel is as follows:
__global__ void string_search(char** texts, int* lengths, char* symbol, int* matches, int symbolLength)
{
int localMatches = 0;
int blockId = blockIdx.x + blockIdx.y * gridDim.x;
int threadId = threadIdx.x + threadIdx.y * blockDim.x;
int blockThreads = blockDim.x * blockDim.y;
__shared__ int localMatchCounts[32];
bool breaking = false;
for(int i = 0; i < (lengths[blockId] - (symbolLength - 1)); i += blockThreads)
{
if(texts[blockId][i] == symbol[0])
{
for(int j = 1; j < symbolLength; j++)
{
if(texts[blockId][i + j] != symbol[j])
{
breaking = true;
break;
}
}
if (breaking) continue;
localMatches++;
}
}
localMatchCounts[threadId] = localMatches;
__syncthreads();
if(threadId == 0)
{
int sum = 0;
for(int i = 0; i < 32; i++)
{
sum += localMatchCounts[i];
}
matches[blockId] = sum;
}
}
If I replace the line
localMatchCounts[threadId] = localMatches;
after the first for loop with this line
localMatchCounts[threadId] = 5;
it compiles with no notices. This can also be achieved by commenting out seemingly random parts of the loop above the line. I have also tried replacing the local memory array with a normal array to no effect. Can anyone tell me what the problem is?
The system is Vista 64bit, for what its worth.
Edit: I fixed the code so it actually works, though it still produces the compiler notice. It does not seem as though the warning is a problem, at least with regards to correctness (it might affect performance).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
像 char** 这样的指针数组在内核中是有问题的,因为内核无法访问主机的内存。
最好分配一个连续缓冲区并以支持并行访问的方式对其进行划分。
在这种情况下,我将定义一个一维数组,其中包含依次定位的所有字符串和另一个一维数组,大小为 2*numberOfStrings,其中包含第一个数组中每个字符串的偏移量及其长度:
例如 - 内核准备:
In kernel:
Arrays of pointers like char** are problematic in kernels, since the kernels have no access to the host's memory.
It is better to allocate a single continuous buffer and to divide it in a manner that enables parallel access.
In this case I'd define a 1D array which contains all the strings positioned one after another and another 1D array, sized 2*numberOfStrings which contains the offset of each string within the first array and it's length:
For example - preparation for kernel:
In kernel:
该问题似乎与 char** 参数有关。 将其转换为 char* 解决了警告,因此我怀疑 cuda 可能对这种形式的数据存在问题。 也许 cuda 更喜欢在这种情况下使用特定的 cuda 2D 数组。
The problem seems to be associated with the char** parameter. Turning this into a char* solved the warning, so I suspect that cuda might have problems with this form of data. Perhaps cuda prefers that one uses the specific cuda 2D arrays in this case.