向量乘法程序中奇怪的 CUDA 行为

发布于 2024-09-07 20:38:52 字数 874 浏览 3 评论 0原文

我在使用一个非常基本的 CUDA 程序时遇到了一些问题。我有一个程序，可以将主机和设备上的两个向量相乘，然后比较它们。这工作没有问题。错误的是我试图测试不同数量的线程和块以用于学习目的。我有以下内核：

__global__ void multiplyVectorsCUDA(float *a,float *b, float *c, int N){
    int idx = threadIdx.x;
    if (idx<N) 
        c[idx] = a[idx]*b[idx];
}

我称之为：

multiplyVectorsCUDA <<<nBlocks, nThreads>>> (vector_a_d,vector_b_d,vector_c_d,N);

目前我已将 nBLocks 固定为 1，因此我只改变向量大小 N 和线程数 n线程。据我了解，每次乘法都会有一个线程，因此 N 和 nThreads 应该相等。

问题如下：

我首先使用 N=16 和 nThreads<16 调用内核，但这不起作用。（这没问题）
然后我用 N=16 和 nThreads=16 调用它，效果很好。（再次按预期工作）
但是当我使用 N=16 和 nThreads<16 调用它时，它仍然有效！

我不明白为什么最后一步不像第一步那样失败。仅当我重新启动电脑时，它才会再次失败。

有没有人遇到过类似的事情或者可以解释这种行为？

原文

I'm having some trouble with a very basic CUDA program. I have a program that multiplies two vectors on the Host and on the Device and then compares them. This works without a problem. What's wrong is that I'm trying to test different number of threads and blocks for learning purposes. I have the following kernel:

__global__ void multiplyVectorsCUDA(float *a,float *b, float *c, int N){
    int idx = threadIdx.x;
    if (idx<N) 
        c[idx] = a[idx]*b[idx];
}

which I call like:

multiplyVectorsCUDA <<<nBlocks, nThreads>>> (vector_a_d,vector_b_d,vector_c_d,N);

For the moment I've fixed nBLocks to 1 so I only vary the vector size N and the number of threads nThreads. From what I understand, there will be a thread for each multiplication so N and nThreads should be equal.

The problem is the following

I first call the kernel with N=16 and nThreads<16 which doesn't work. (This is ok)
Then I call it with N=16 and nThreads=16 which works fine. (Again
works as expected)
But when I call it with N=16 and nThreads<16 it still works!

I don't understand why the last step doesn't fail like the first one. It only fails again if I restart my PC.

Has anyone run into something like this before or can explain this behavior?

分享到QQ

分享到微博