快速CUDA推力自定义比较运算符

发布于 2024-12-29 04:15:18 字数 1313 浏览 5 评论 0原文

我正在评估 CUDA，目前使用 Thrust 库对数字进行排序。

我想为推力::排序创建我自己的比较器，但它的速度大大减慢！我通过从 function.h 复制代码来创建自己的 less 实现。然而它似乎是以其他方式编译的并且运行速度非常慢。

默认比较器：thrust::less() - 94ms
我自己的比较器：less() - 906ms

我正在使用 Visual Studio 2010。我应该怎么做才能得到与选项 1 的性能相同吗？

完整代码：

#include <stdio.h>

#include <cuda.h>

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/sort.h>

int myRand()
{
        static int counter = 0;
        if ( counter++ % 10000 == 0 )
                srand(time(NULL)+counter);
        return (rand()<<16) | rand();
}

template<typename T>
struct less : public thrust::binary_function<T,T,bool>
{
  __host__ __device__ bool operator()(const T &lhs, const T &rhs) const {
     return lhs < rhs;
  }
}; 

int main()
{
    thrust::host_vector<int> h_vec(10 * 1000 * 1000);
    thrust::generate(h_vec.begin(), h_vec.end(), myRand);

    thrust::device_vector<int> d_vec = h_vec;

    int clc = clock();
    thrust::sort(d_vec.begin(), d_vec.end(), less<int>());
    printf("%dms\n", (clock()-clc) * 1000 / CLOCKS_PER_SEC);

    return 0;
}

原文

I'm evaluating CUDA and currently using Thrust library to sort numbers.

I'd like to create my own comparer for thrust::sort, but it slows down drammatically!
I created my own less implemetation by just copying code from functional.h.
However it seems to be compiled in some other way and works very slowly.

default comparer: thrust::less() - 94ms
my own comparer: less() - 906ms

I'm using Visual Studio 2010. What should I do to get the same performance as at option 1?

Complete code:

#include <stdio.h>

#include <cuda.h>

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/sort.h>

int myRand()
{
        static int counter = 0;
        if ( counter++ % 10000 == 0 )
                srand(time(NULL)+counter);
        return (rand()<<16) | rand();
}

template<typename T>
struct less : public thrust::binary_function<T,T,bool>
{
  __host__ __device__ bool operator()(const T &lhs, const T &rhs) const {
     return lhs < rhs;
  }
}; 

int main()
{
    thrust::host_vector<int> h_vec(10 * 1000 * 1000);
    thrust::generate(h_vec.begin(), h_vec.end(), myRand);

    thrust::device_vector<int> d_vec = h_vec;

    int clc = clock();
    thrust::sort(d_vec.begin(), d_vec.end(), less<int>());
    printf("%dms\n", (clock()-clc) * 1000 / CLOCKS_PER_SEC);

    return 0;
}

分享到QQ

分享到微博