Thrust::Sort 编译时间很长

发布于 2024-11-08 00:22:12 字数 1485 浏览 4 评论 0原文

我正在尝试使用 Thrust 编译一段示例代码，以帮助学习一些 CUDA。

我正在使用 Visual Studio 2010，并且我还获得了其他要编译的示例。然而，当我编译这个例子时，编译需要花费 10 分钟以上。我选择性地注释掉了一些行，并发现 Thrust::sort 行需要永远运行（注释掉这一行后，编译需要大约 5 秒）。

我在某个地方找到了一篇文章，讨论了 Thrust 中排序的编译速度如何缓慢，这是 Thrust 开发团队做出的决定（运行时速度快了 3 倍，但编译时间更长）。但那篇文章是在 2008 年底发布的。

知道为什么要花这么长时间吗？

另外，我正在具有以下规格的机器上进行编译，因此它不是一台慢机器

i7-2600k @ 4.5 ghz
16 GB DDR3 @ 1833 mhz
Raid 0 of 6 GB/s 1TB 驱动器

根据要求，这是看起来 Visual Studio 正在调用

C:\Program Files\NVIDIA GPU Compute Toolkit\CUDA\v3.2\bin\nvcc.exe" -ccbin "C:\Program Files (x86)\Microsoft Visual 的构建字符串Studio 9.0\VC\bin" -I"C:\Program Files\NVIDIA GPU 计算工具包\CUDA\v3.2\include" -G0 --keep-dir "调试\" -maxrregcount=32 --machine 64 --编译 -D_NEXUS_DEBUG -g -Xcompiler "/EHsc /nologo /Od /Zi /MTd" -o "Debug\kernel.obj" "C:\Users\Rob\Desktop\VS2010Test\VS2010Test\VS2010Test\kernel.cpp" -clean

例子

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/sort.h>
int main(void)
{
    // generate 16M random numbers on the host
    thrust::host_vector<int> h_vec(1 << 24);
    thrust::generate(h_vec.begin(), h_vec.end(), rand);
    // transfer data to the device
    thrust::device_vector<int> d_vec = h_vec;
    // sort data on the device
    thrust::sort(d_vec.begin(), d_vec.end());
    // transfer data back to host
    thrust::copy(d_vec.begin(), d_vec.end(), h_vec.begin());
    return 0;
}

原文

I'm trying to compile a block of example code using Thrust in an attempt to help learn some CUDA.

I'm using Visual Studio 2010, and I've gotten other examples to compile. However, when I compile this example, it takes upwards of 10 minutes to compile. I've selectively commented out lines and figured out that its the Thrust::sort line that takes forever (with that one line commented out it takes about 5 seconds to compile).

I found a post somewhere that talked about how sort was slow to compile in Thrust and that was a decision that the Thrust development team made (its 3x faster at runtime, but takes longer to compile). But that post was in late 2008.

Any idea why this is taking so long?

Also, I'm compiling on a machine with the following specs, so its not a slow machine

i7-2600k @ 4.5 ghz
16 GB DDR3 @ 1833 mhz
Raid 0 of 6 GB/s 1TB drives

As requested, this is the build string that it looks like Visual Studio is invoking

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\bin\nvcc.exe" -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\include" -G0 --keep-dir "Debug\" -maxrregcount=32 --machine 64 --compile -D_NEXUS_DEBUG -g -Xcompiler "/EHsc /nologo /Od /Zi /MTd " -o "Debug\kernel.obj" "C:\Users\Rob\Desktop\VS2010Test\VS2010Test\VS2010Test\kernel.cpp" -clean

Example

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/sort.h>
int main(void)
{
    // generate 16M random numbers on the host
    thrust::host_vector<int> h_vec(1 << 24);
    thrust::generate(h_vec.begin(), h_vec.end(), rand);
    // transfer data to the device
    thrust::device_vector<int> d_vec = h_vec;
    // sort data on the device
    thrust::sort(d_vec.begin(), d_vec.end());
    // transfer data back to host
    thrust::copy(d_vec.begin(), d_vec.end(), h_vec.begin());
    return 0;
}

分享到QQ

分享到微博