CUDA& VS2010问题

发布于 2024-10-20 05:00:07 字数 2106 浏览 1 评论 0原文

我在互联网上搜索了这个问题的答案，但找不到任何答案。我已经安装了 CUDA 3.2 SDK（以及刚刚的 CUDA 4.0 RC），经过长时间使用包含目录、NSight 和所有其他内容后，一切似乎都工作正常。好吧，除了这一件事：它不断突出显示 <<< >>> 运算符是一个错误。仅适用于 VS2010，不适用于 VS2008。

在 VS2010 上，我还收到以下几种警告：

C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\xdebug(109): warning C4251: 'std::_String_val<_Ty,_Alloc>::_Alval' : class 'std::_DebugHeapAllocator<_Ty>' needs to have dll-interface to be used by clients of class 'std::_String_val<_Ty,_Alloc>'

更新： 如果我尝试在调用 CUDA 内核的 .cpp 文件中包含一个入口点，而不是编写正如我所做的那样，在 .cu 文件中的 main() 中，除了突出显示该运算符之外，该运算符实际上还被标记为错误！ VS2008 也会发生同样的情况。

有人知道如何解决这个问题吗？

更新 2： 这是代码。 main.cpp 文件：

#include "kernel.cu"

int main()
{
    doStuff();
    return 0;
}

和 .cu 文件：

#include <iostream>
#include "cuda.h"
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <cutil_inline.h>
#include <time.h>

using namespace std;

#define N 16

__global__ void MatAdd(float A[N][N], float B[N][N], float C[N][N])

{
    int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;

if (i < N && j < N)
    C[i][j] = A[i][j] + B[i][j];
}

int doStuff()
{
    dim3 threadsPerBlock(8, 8);
    dim3 numBlocks(N / threadsPerBlock.x, N / threadsPerBlock.y);

    float A[N][N], B[N][N], C[N][N];

    for (int i = 0; i < N; ++i)
        for (int j = 0; j < N; ++j)
        {
            A[i][j] = 0;
            B[i][j] = 0;
            C[i][j] = 0;
        }

    clock_t start = clock();
    MatAdd<<<numBlocks, threadsPerBlock>>>(A, B, C);
    clock_t end = clock();

    cout << "Took " << float(end - start) << "ms to work out." << endl;
    cin.get();

    return 0;
}

更新 3： 好吧，我（愚蠢地）将 CUDA 代码包含在 < code>.cpp 文件，所以当然无法编译。现在我已经在 VS2010 上启动并运行了 CUDA 4.0，但我仍然收到一些上述类型的警告。

原文

I have scoured the internets looking for an answer to this one, but couldn't find any. I've installed the CUDA 3.2 SDK (and, just now, CUDA 4.0 RC) and everything seems to work fine after long hours of fooling around with include directories, NSight, and all the rest. Well, except this one thing: it keeps highlighting the <<< >>> operator as a mistake. Only on VS2010--not on VS2008.

On VS2010 I also get several warnings of the following sort:

C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\xdebug(109): warning C4251: 'std::_String_val<_Ty,_Alloc>::_Alval' : class 'std::_DebugHeapAllocator<_Ty>' needs to have dll-interface to be used by clients of class 'std::_String_val<_Ty,_Alloc>'

Update: If I try and include an entry point in a .cpp file that calls a CUDA kernel, instead of writing main() in a .cu file as I was doing, the operator is actually flagged as an error, besides highlighting it! The same thing happens with VS2008.

Anyone know how this can be fixed?

Update 2: Here is the code. The main.cpp file:

#include "kernel.cu"

int main()
{
    doStuff();
    return 0;
}

and the .cu file:

#include <iostream>
#include "cuda.h"
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <cutil_inline.h>
#include <time.h>

using namespace std;

#define N 16

__global__ void MatAdd(float A[N][N], float B[N][N], float C[N][N])

{
    int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;

if (i < N && j < N)
    C[i][j] = A[i][j] + B[i][j];
}

int doStuff()
{
    dim3 threadsPerBlock(8, 8);
    dim3 numBlocks(N / threadsPerBlock.x, N / threadsPerBlock.y);

    float A[N][N], B[N][N], C[N][N];

    for (int i = 0; i < N; ++i)
        for (int j = 0; j < N; ++j)
        {
            A[i][j] = 0;
            B[i][j] = 0;
            C[i][j] = 0;
        }

    clock_t start = clock();
    MatAdd<<<numBlocks, threadsPerBlock>>>(A, B, C);
    clock_t end = clock();

    cout << "Took " << float(end - start) << "ms to work out." << endl;
    cin.get();

    return 0;
}

Update 3: Alright, I was (idiotically) including the CUDA code in the .cpp file, so of course it couldn't compile. Now I have CUDA 4.0 up and running on VS2010, but I still get several warnings of the kind explained above.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

俯瞰星空 2024-10-27 05:00:07

您不能这样做...

#include "kernel.cu"

现在您要求 Visual Studio CPP 编译器编译 .CU 文件，就好像它是标头一样。您需要有一个声明 doStuff() 的头文件并包含头文件而不是定义。

以下内容可能会有所帮助。

http://www. ademiller.com/blogs/tech/2010/12/using-cudathrust-with-the-parallel-patterns-library/

http://blog.cuvilib.com/2011/02/24/how-to-run-cuda-in- Visual-studio-2010/

通常我将其设置为两个项目。一个项目针对 .CU 的 2008 CPP 编译器进行编译，另一个项目使用 2010 编译器来获取所有 C++0x 功能。

您收到的警告可以通过导出适当的模板来修复。类似这样的东西，但您必须为每种警告类型编写一个特定的警告。

#if defined(__CUDACC__)
#define DECLSPECIFIER  __declspec(dllexport)
#define EXPIMP_TEMPLATE

#else
#define DECLSPECIFIER  __declspec(dllimport)
#define EXPIMP_TEMPLATE extern
#endif

EXPIMP_TEMPLATE template class DECLSPECIFIER thrust::device_vector<unsigned long>;

请参阅：

http://support.microsoft.com/default.aspx ?scid=KB;EN-US;168958 和
http://msdn.microsoft.com/en-us/library/esew7y1w.aspx

我在这里编写了设置 VS 2010 和 CUDA 4.0 的分步指南

http://www.ademiller.com/blogs/tech/2011/03/using-cuda-and -thrust-with-visual-studio-2010/

顺便说一句：对 CUDA 代码进行计时的更好方法是使用事件 API。

cudaEvent_t start, stop; 
float time;
cudaEventCreate(&start);
cudaEventCreate(&stop); 
cudaEventRecord( start, 0 ); 
kernel<<<grid,threads>>> ( d_odata, d_idata, size_x, size_y, NUM_REPS); 
cudaEventRecord( stop, 0 ); 
cudaEventSynchronize( stop ); 
cudaEventElapsedTime( &time, start, stop );
cudaEventDestroy( start );
cudaEventDestroy( stop );

You cannot do this...

#include "kernel.cu"

Now you're asking the Visual Studio CPP compiler to compile the .CU file as though it was a header. You need to have a header file that declares doStuff() and include the header not the definition.

The following might be helpful.

http://www.ademiller.com/blogs/tech/2010/12/using-cudathrust-with-the-parallel-patterns-library/

http://blog.cuvilib.com/2011/02/24/how-to-run-cuda-in-visual-studio-2010/

Typically I set this up as two projects. One project that compiles against the the 2008 CPP compiler for .CU and another that uses the 2010 compiler to get all the C++0x features.

The warnings your getting can be fixed by exporting the appropriate templates. Something like this but you'll have to write a specific one for each of the warning types.

#if defined(__CUDACC__)
#define DECLSPECIFIER  __declspec(dllexport)
#define EXPIMP_TEMPLATE

#else
#define DECLSPECIFIER  __declspec(dllimport)
#define EXPIMP_TEMPLATE extern
#endif

EXPIMP_TEMPLATE template class DECLSPECIFIER thrust::device_vector<unsigned long>;

See:

http://support.microsoft.com/default.aspx?scid=KB;EN-US;168958 and
http://msdn.microsoft.com/en-us/library/esew7y1w.aspx

I've written a step-by-step guide to setting up VS 2010 and CUDA 4.0 here

http://www.ademiller.com/blogs/tech/2011/03/using-cuda-and-thrust-with-visual-studio-2010/

BTW: A better way of timing CUDA code is with the event API.

cudaEvent_t start, stop; 
float time;
cudaEventCreate(&start);
cudaEventCreate(&stop); 
cudaEventRecord( start, 0 ); 
kernel<<<grid,threads>>> ( d_odata, d_idata, size_x, size_y, NUM_REPS); 
cudaEventRecord( stop, 0 ); 
cudaEventSynchronize( stop ); 
cudaEventElapsedTime( &time, start, stop );
cudaEventDestroy( start );
cudaEventDestroy( stop );

回复收藏 0 原文