OpenMP目标卸载矩阵乘法编译错误

发布于 2025-02-05 10:12:58 字数 1790 浏览 3 评论 0原文

我当前正在尝试使用OpenMP目标卸载实现2 nxn矩阵的简单矩阵乘法。该代码取自在这里

template<typename T>
void multiplyJIK(T *A, T *B, T *C, uint64_t size) {

    #pragma omp target data device(0) map(to: A[0:size*size], B[0:size * size], size) map(tofrom:     C[0:size * size])
    {
        #pragma omp target teams device(0) num_teams(32768) thread_limit(512) \
            map(to: A[0:size*size], B[0:size * size], size) map(tofrom: C[0:size * size]) \
            default(none) shared(A, B, C, size)

        #pragma omp distribute parallel for num_threads(512) dist_schedule(static, 512) \
            default(none) shared(A, B, C, size)
    
        for (uint64_t j = 0; j < size; ++j) {
            for (uint64_t i = 0; i < size; ++i) {
                for (uint64_t k = 0; k < size; ++k) {
                    C[i * size + j] += A[i * size + k] * B[k * size + j];
                }
            }
        }
    }
}

应乘以2个矩阵a和b并将结果存储在c中。矩阵表示为长度size * size的OnEdimensional阵列。

对于我的测试，t是float，我尝试使用NVHPC工具包编译代码：nvc ++ -std = C ++ 17 -MP = GPU- target = gpu main.cpp -o matmul并获取此错误：

error: item must appear in a SHARED or PRIVATE clause:
                          C[i * size + j] += A[i * size + k] * B[k * size + j];
                          ^
       detected during instantiation of "void Target::multiplyJIK(T *, T *, T *, uint64_t) [with T=float]"

我不明白此错误，因为应该正确映射C数组（map（tofrom：c ...））和存在于共享（...）子句中。我是否缺少代码中的某些内容，或者这是编译标志的问题？

原文

I am currently trying to implement a simple matrix multiplication of 2 nxn matrices using OpenMP target offloading. The code is taken from here:

template<typename T>
void multiplyJIK(T *A, T *B, T *C, uint64_t size) {

    #pragma omp target data device(0) map(to: A[0:size*size], B[0:size * size], size) map(tofrom:     C[0:size * size])
    {
        #pragma omp target teams device(0) num_teams(32768) thread_limit(512) \
            map(to: A[0:size*size], B[0:size * size], size) map(tofrom: C[0:size * size]) \
            default(none) shared(A, B, C, size)

        #pragma omp distribute parallel for num_threads(512) dist_schedule(static, 512) \
            default(none) shared(A, B, C, size)
    
        for (uint64_t j = 0; j < size; ++j) {
            for (uint64_t i = 0; i < size; ++i) {
                for (uint64_t k = 0; k < size; ++k) {
                    C[i * size + j] += A[i * size + k] * B[k * size + j];
                }
            }
        }
    }
}

It should multiply the 2 matrices A and B and store the results in C. The matrices are represented as onedimensional arrays of length size * size.

For my test, T is a float and I try to compile the code using the nvhpc toolkit: nvc++ -std=c++17 -mp=gpu -target=gpu main.cpp -o matmul and get this error:

error: item must appear in a SHARED or PRIVATE clause:
                          C[i * size + j] += A[i * size + k] * B[k * size + j];
                          ^
       detected during instantiation of "void Target::multiplyJIK(T *, T *, T *, uint64_t) [with T=float]"

I dont understand this error as the C array should be correctly mapped (map(tofrom: C...)) and is present in the shared(...) clause. Am I missing something in the code or is this a problem with the compile flags?

分享到QQ

分享到微博