OpenMP目标卸载矩阵乘法编译错误

发布于 2025-02-05 10:12:58 字数 1790 浏览 3 评论 0原文

我当前正在尝试使用OpenMP目标卸载实现2 nxn矩阵的简单矩阵乘法。该代码取自在这里

template<typename T>
void multiplyJIK(T *A, T *B, T *C, uint64_t size) {

    #pragma omp target data device(0) map(to: A[0:size*size], B[0:size * size], size) map(tofrom:     C[0:size * size])
    {
        #pragma omp target teams device(0) num_teams(32768) thread_limit(512) \
            map(to: A[0:size*size], B[0:size * size], size) map(tofrom: C[0:size * size]) \
            default(none) shared(A, B, C, size)

        #pragma omp distribute parallel for num_threads(512) dist_schedule(static, 512) \
            default(none) shared(A, B, C, size)
    
        for (uint64_t j = 0; j < size; ++j) {
            for (uint64_t i = 0; i < size; ++i) {
                for (uint64_t k = 0; k < size; ++k) {
                    C[i * size + j] += A[i * size + k] * B[k * size + j];
                }
            }
        }
    }
}

应乘以2个矩阵ab并将结果存储在c中。矩阵表示为长度size * size的OnEdimensional阵列。

对于我的测试,tfloat,我尝试使用NVHPC工具包编译代码:nvc ++ -std = C ++ 17 -MP = GPU- target = gpu main.cpp -o matmul并获取此错误:

error: item must appear in a SHARED or PRIVATE clause:
                          C[i * size + j] += A[i * size + k] * B[k * size + j];
                          ^
       detected during instantiation of "void Target::multiplyJIK(T *, T *, T *, uint64_t) [with T=float]"

我不明白此错误,因为应该正确映射C数组(map(tofrom:c ...))和存在于共享(...)子句中。我是否缺少代码中的某些内容,或者这是编译标志的问题?

I am currently trying to implement a simple matrix multiplication of 2 nxn matrices using OpenMP target offloading. The code is taken from here:

template<typename T>
void multiplyJIK(T *A, T *B, T *C, uint64_t size) {

    #pragma omp target data device(0) map(to: A[0:size*size], B[0:size * size], size) map(tofrom:     C[0:size * size])
    {
        #pragma omp target teams device(0) num_teams(32768) thread_limit(512) \
            map(to: A[0:size*size], B[0:size * size], size) map(tofrom: C[0:size * size]) \
            default(none) shared(A, B, C, size)

        #pragma omp distribute parallel for num_threads(512) dist_schedule(static, 512) \
            default(none) shared(A, B, C, size)
    
        for (uint64_t j = 0; j < size; ++j) {
            for (uint64_t i = 0; i < size; ++i) {
                for (uint64_t k = 0; k < size; ++k) {
                    C[i * size + j] += A[i * size + k] * B[k * size + j];
                }
            }
        }
    }
}

It should multiply the 2 matrices A and B and store the results in C. The matrices are represented as onedimensional arrays of length size * size.

For my test, T is a float and I try to compile the code using the nvhpc toolkit: nvc++ -std=c++17 -mp=gpu -target=gpu main.cpp -o matmul and get this error:

error: item must appear in a SHARED or PRIVATE clause:
                          C[i * size + j] += A[i * size + k] * B[k * size + j];
                          ^
       detected during instantiation of "void Target::multiplyJIK(T *, T *, T *, uint64_t) [with T=float]"

I dont understand this error as the C array should be correctly mapped (map(tofrom: C...)) and is present in the shared(...) clause. Am I missing something in the code or is this a problem with the compile flags?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文