当前位置：文江博客话题详情

CUDA：获取数组中的最大值及其索引

发布于 2024-11-02 11:08:26 字数 307 浏览 1 评论 0原文

我有几个块，每个块在整数数组的单独部分上执行。举个例子：第一个块从数组[0]到数组[9]，第二个块从数组[10]到数组[20]。

我可以获得每个块的数组最大值的索引的最佳方法是什么？

示例块一 a[0] 到 a[10] 具有以下值：
5 10 2 3 4 34 56 3 9 10

所以 56 是索引 6 处的最大值。

我无法使用共享内存，因为数组的大小可能非常大。因此它不会适合。有没有任何库可以让我做得这么快？

我知道缩减算法，但我认为我的情况有所不同，因为我想获取最大元素的索引。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

吾性傲以野 2024-11-09 11:08:26

如果我确切地理解你想要的是：获取数组 A 中最大值的索引。

如果这是真的，那么我建议您使用推力库：

以下是您的操作方法：

#include <thrust/device_vector.h>
#include <thrust/tuple.h>
#include <thrust/reduce.h>
#include <thrust/fill.h>
#include <thrust/generate.h>
#include <thrust/sort.h>
#include <thrust/sequence.h>
#include <thrust/copy.h>
#include <cstdlib>
#include <time.h>

using namespace thrust;

// return the biggest of two tuples
template <class T>
struct bigger_tuple {
    __device__ __host__
    tuple<T,int> operator()(const tuple<T,int> &a, const tuple<T,int> &b) 
    {
        if (a > b) return a;
        else return b;
    } 

};

template <class T>
int max_index(device_vector<T>& vec) {

    // create implicit index sequence [0, 1, 2, ... )
    counting_iterator<int> begin(0); counting_iterator<int> end(vec.size());
    tuple<T,int> init(vec[0],0); 
    tuple<T,int> smallest;

    smallest = reduce(make_zip_iterator(make_tuple(vec.begin(), begin)), make_zip_iterator(make_tuple(vec.end(), end)),
                      init, bigger_tuple<T>());
    return get<1>(smallest);
}

int main(){

    thrust::host_vector<int> h_vec(1024);
    thrust::sequence(h_vec.begin(), h_vec.end()); // values = indices

    // transfer data to the device
    thrust::device_vector<int> d_vec = h_vec;

    int index = max_index(d_vec);

    std::cout <<  "Max index is:" << index <<std::endl;
    std::cout << "Value is: " << h_vec[index] <<std::endl;

    return 0;
}

If I understood exactly what you want is : Get the index for the array A of the max value inside it.

If that is true then I would suggest you to use the thrust library:

Here is how you would do it:

#include <thrust/device_vector.h>
#include <thrust/tuple.h>
#include <thrust/reduce.h>
#include <thrust/fill.h>
#include <thrust/generate.h>
#include <thrust/sort.h>
#include <thrust/sequence.h>
#include <thrust/copy.h>
#include <cstdlib>
#include <time.h>

using namespace thrust;

// return the biggest of two tuples
template <class T>
struct bigger_tuple {
    __device__ __host__
    tuple<T,int> operator()(const tuple<T,int> &a, const tuple<T,int> &b) 
    {
        if (a > b) return a;
        else return b;
    } 

};

template <class T>
int max_index(device_vector<T>& vec) {

    // create implicit index sequence [0, 1, 2, ... )
    counting_iterator<int> begin(0); counting_iterator<int> end(vec.size());
    tuple<T,int> init(vec[0],0); 
    tuple<T,int> smallest;

    smallest = reduce(make_zip_iterator(make_tuple(vec.begin(), begin)), make_zip_iterator(make_tuple(vec.end(), end)),
                      init, bigger_tuple<T>());
    return get<1>(smallest);
}

int main(){

    thrust::host_vector<int> h_vec(1024);
    thrust::sequence(h_vec.begin(), h_vec.end()); // values = indices

    // transfer data to the device
    thrust::device_vector<int> d_vec = h_vec;

    int index = max_index(d_vec);

    std::cout <<  "Max index is:" << index <<std::endl;
    std::cout << "Value is: " << h_vec[index] <<std::endl;

    return 0;
}

回复收藏 0 原文

一世旳自豪 2024-11-09 11:08:26

这不会使原始发帖人受益，但对于那些来到此页面寻找答案的人来说，我会赞同使用推力的建议，该推力已经具有函数推力::max_element，该函数正是这样做的 - 返回最大元素的索引。还提供了 min_element 和 minmax_element 函数。有关详细信息，请参阅此处的推力文档。

回复收藏 0 原文

一袭水袖舞倾城 2024-11-09 11:08:26

除了使用 Thrust 的建议之外，您还可以使用 CUBLAS cublasIsamax 函数。

回复收藏 0 原文

赤濁 2024-11-09 11:08:26

与共享内存相比，数组的大小几乎无关紧要，因为每个块中的线程数是限制因素，而不是数组的大小。一种解决方案是让每个线程块处理与线程块大小相同的数组大小。也就是说，如果有 512 个线程，则块 n 将查看 array[ n ] 到 array[ n + 511 ]。每个块都会进行归约以找到数组该部分中的最高成员。然后，将每个部分的最大值带回主机并进行简单的线性搜索以找到整个数组中的最高值。 GPU 的每次缩减都会将线性搜索减少 512 倍。根据数组的大小，您可能需要在返回数据之前进行更多缩减。（如果您的数组大小为 3*512^10，您可能需要在 GPU 上进行 10 次缩减，并让主机搜索剩余的 3 个数据点。）

回复收藏 0 原文

硪扪都還晓 2024-11-09 11:08:26

在进行最大值加索引缩减时要注意的一件事是，如果数组中存在多个相同值的最大元素，即在您的示例中，如果有 2 个或更多值等于 56，则索引为返回的值不会是唯一的，并且每次运行代码时可能会有所不同，因为 GPU 上的线程排序的时间是不确定的。

要解决此类问题，您可以使用唯一的排序索引，例如 threadid +threadsperblock * blockid，或者元素索引位置（如果它是唯一的）。然后最大测试是沿着这些线：（

if(a>max_so_far || a==max_so_far && order_a>order_max_so_far)
{ 
    max_so_far = a;
    index_max_so_far = index_a;
    order_max_so_far = order_a;
}

索引和顺序可以是相同的变量，具体取决于应用程序。）

One thing to watch out for when doing a max value plus index reduction is that if there is more than one identical valued maximum element in your array, i.e. in your example if there were 2 or more values equal to 56, then the index which is returned would not be unique and possibly be different on every run of the code because the timing of the thread ordering over the GPU is not deterministic.

To get around this kind of problem you can use a unique ordering index such as threadid + threadsperblock * blockid, or else the element index location if that is unique. Then the max test is along these lines:

if(a>max_so_far || a==max_so_far && order_a>order_max_so_far)
{ 
    max_so_far = a;
    index_max_so_far = index_a;
    order_max_so_far = order_a;
}

(index and order can be the same variable, depending on the application.)

回复收藏 0 原文

~没有更多了~