英特尔 MKL 错误：调用 gemm() 时参数不正确

发布于 2025-01-14 00:48:24 字数 4168 浏览 1 评论 0原文

我有这样的代码：

void my_function(double *image_vector, double *endmembers, double *abundanceVector, int it, int lines, int samples, int bands, int targets)
{
    double *h_Num;
    double *h_aux;
    double *h_Den;
    int lines_samples = lines*samples;
        
    h_Num = (double*) malloc(lines_samples * targets * sizeof(double));
    h_aux = (double*) malloc(lines_samples * bands * sizeof(double));
    h_Den = (double*) malloc(lines_samples * targets * sizeof(double));

    sycl::queue my_queue{sycl::default_selector{}};

        std::cout << "Device: "
                  << my_queue.get_device().get_info<sycl::info::device::name>()
                  << std::endl;
    
    // USM declaration
    double* image_vector_usm = sycl::malloc_shared<double>(lines_samples*bands, my_queue);
    double* endmembers_usm = sycl::malloc_shared<double>(targets*bands, my_queue);
    double* abundanceVector_usm = sycl::malloc_shared<double>(lines_samples*targets, my_queue); 
    double* h_Num_usm = sycl::malloc_shared<double>(lines_samples*targets, my_queue);
    double* h_aux_usm = sycl::malloc_shared<double>(lines_samples*bands, my_queue);
    double* h_Den_usm = sycl::malloc_shared<double>(lines_samples*targets, my_queue);
    auto nonTrans = oneapi::mkl::transpose::nontrans;
    auto yesTrans = oneapi::mkl::transpose::trans;
    
    int i,j;
    
    // We copy the parameters values into the USM variables // Maybe the mistake is here?
    std::memcpy(image_vector_usm, image_vector,sizeof(double) * lines_samples*bands);
    std::memcpy(endmembers_usm, endmembers,sizeof(double) * targets*bands);
    
    // Initialization
    for(i=0; i<lines_samples*targets; i++)
        abundanceVector_usm[i]=1;

    double alpha = 1.0;
    double beta = 0.0;

    // Start of callings to dgemm()

      oneapi::mkl::blas::row_major::gemm(my_queue, nonTrans, yesTrans, lines_samples, targets, bands, alpha, image_vector_usm,lines_samples, endmembers_usm, targets, beta, h_Num_usm, lines_samples);

    my_queue.wait_and_throw();

    for(i=0; i<it; i++)
    { 
        oneapi::mkl::blas::row_major::gemm(my_queue, nonTrans, nonTrans, lines_samples, targets, bands, alpha, abundanceVector_usm, lines_samples, endmembers_usm, targets, beta, h_aux_usm, lines_samples);
        
        my_queue.wait_and_throw();

        oneapi::mkl::blas::row_major::gemm(my_queue, nonTrans, yesTrans, lines_samples, targets, bands, alpha,h_aux_usm, lines_samples, endmembers_usm, targets, beta, h_Den_usm, lines_samples);

        my_queue.wait_and_throw();

        my_queue.parallel_for(sycl::range<1> (lines_samples*targets), [=] (sycl::id<1> j){
            abundanceVector_usm[j] = abundanceVector_usm[j]*(h_Num_usm[j]/h_Den_usm[j]);
        }).wait();
    }

    free(h_Den);
    free(h_Num);
    free(h_aux);
    
    // Free SYCL
    free(image_vector_usm, my_queue);
    free(endmembers_usm, my_queue);
    free(abundanceVector_usm, my_queue);
    free(h_Num_usm, my_queue);
    free(h_aux_usm, my_queue);
    free(h_Den_usm, my_queue);
}

这是 makefile，我从一个名为“matrix_mul_mkl”的默认 oneMKL 示例借用了它，并将其改编为我的文件名。 makefile 称为 GNUmakefile：

# Makefile for GNU Make

default: run

all: run

run: my_code

MKL_COPTS = -DMKL_ILP64  -I"${MKLROOT}/include"
MKL_LIBS = -L${MKLROOT}/lib/intel64 -lmkl_sycl -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lsycl -lOpenCL -lpthread -lm -ldl

DPCPP_OPTS = $(MKL_COPTS) -fsycl-device-code-split=per_kernel $(MKL_LIBS)

my_code: my_code.cpp RS_algorithm.cpp # This RS file is also needed to compile, nothing strange there I believe, completely sequential and just calls the function in my_code.
    dpcpp $^ -o $@ $(DPCPP_OPTS)


clean:
    -rm -f my_code

.PHONY: clean run all

我知道有时 ILP64 或 LP64 库会出现问题，但上面提到的 matrix_mul 示例可以工作，所以这不对吗？

这就是执行返回的结果：

Device: Intel whatever model...
Intel MKL ERROR: Parameter 11 was incorrect on entry to cblas_dgemm.
Segmentation fault.

我在 gemm() 调用的正下方放置了一些打印内容并做了一些测试；第一个调用似乎执行了，但第二个调用没有执行。

我已经尝试并检查了所有内容，有什么问题吗？

先感谢您！

原文

I have this code:

void my_function(double *image_vector, double *endmembers, double *abundanceVector, int it, int lines, int samples, int bands, int targets)
{
    double *h_Num;
    double *h_aux;
    double *h_Den;
    int lines_samples = lines*samples;
        
    h_Num = (double*) malloc(lines_samples * targets * sizeof(double));
    h_aux = (double*) malloc(lines_samples * bands * sizeof(double));
    h_Den = (double*) malloc(lines_samples * targets * sizeof(double));

    sycl::queue my_queue{sycl::default_selector{}};

        std::cout << "Device: "
                  << my_queue.get_device().get_info<sycl::info::device::name>()
                  << std::endl;
    
    // USM declaration
    double* image_vector_usm = sycl::malloc_shared<double>(lines_samples*bands, my_queue);
    double* endmembers_usm = sycl::malloc_shared<double>(targets*bands, my_queue);
    double* abundanceVector_usm = sycl::malloc_shared<double>(lines_samples*targets, my_queue); 
    double* h_Num_usm = sycl::malloc_shared<double>(lines_samples*targets, my_queue);
    double* h_aux_usm = sycl::malloc_shared<double>(lines_samples*bands, my_queue);
    double* h_Den_usm = sycl::malloc_shared<double>(lines_samples*targets, my_queue);
    auto nonTrans = oneapi::mkl::transpose::nontrans;
    auto yesTrans = oneapi::mkl::transpose::trans;
    
    int i,j;
    
    // We copy the parameters values into the USM variables // Maybe the mistake is here?
    std::memcpy(image_vector_usm, image_vector,sizeof(double) * lines_samples*bands);
    std::memcpy(endmembers_usm, endmembers,sizeof(double) * targets*bands);
    
    // Initialization
    for(i=0; i<lines_samples*targets; i++)
        abundanceVector_usm[i]=1;

    double alpha = 1.0;
    double beta = 0.0;

    // Start of callings to dgemm()

      oneapi::mkl::blas::row_major::gemm(my_queue, nonTrans, yesTrans, lines_samples, targets, bands, alpha, image_vector_usm,lines_samples, endmembers_usm, targets, beta, h_Num_usm, lines_samples);

    my_queue.wait_and_throw();

    for(i=0; i<it; i++)
    { 
        oneapi::mkl::blas::row_major::gemm(my_queue, nonTrans, nonTrans, lines_samples, targets, bands, alpha, abundanceVector_usm, lines_samples, endmembers_usm, targets, beta, h_aux_usm, lines_samples);
        
        my_queue.wait_and_throw();

        oneapi::mkl::blas::row_major::gemm(my_queue, nonTrans, yesTrans, lines_samples, targets, bands, alpha,h_aux_usm, lines_samples, endmembers_usm, targets, beta, h_Den_usm, lines_samples);

        my_queue.wait_and_throw();

        my_queue.parallel_for(sycl::range<1> (lines_samples*targets), [=] (sycl::id<1> j){
            abundanceVector_usm[j] = abundanceVector_usm[j]*(h_Num_usm[j]/h_Den_usm[j]);
        }).wait();
    }

    free(h_Den);
    free(h_Num);
    free(h_aux);
    
    // Free SYCL
    free(image_vector_usm, my_queue);
    free(endmembers_usm, my_queue);
    free(abundanceVector_usm, my_queue);
    free(h_Num_usm, my_queue);
    free(h_aux_usm, my_queue);
    free(h_Den_usm, my_queue);
}

This is the makefile, I've borrowed it from a default oneMKL example called "matrix_mul_mkl" and adapted it to my file name. The makefile is called GNUmakefile:

# Makefile for GNU Make

default: run

all: run

run: my_code

MKL_COPTS = -DMKL_ILP64  -I"${MKLROOT}/include"
MKL_LIBS = -L${MKLROOT}/lib/intel64 -lmkl_sycl -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lsycl -lOpenCL -lpthread -lm -ldl

DPCPP_OPTS = $(MKL_COPTS) -fsycl-device-code-split=per_kernel $(MKL_LIBS)

my_code: my_code.cpp RS_algorithm.cpp # This RS file is also needed to compile, nothing strange there I believe, completely sequential and just calls the function in my_code.
    dpcpp $^ -o $@ $(DPCPP_OPTS)


clean:
    -rm -f my_code

.PHONY: clean run all

I know that sometimes there are troubles with the ILP64 or LP64 libraries, but the matrix_mul example mentioned above works, so that can't be right?

And this is what the execution returns:

Device: Intel whatever model...
Intel MKL ERROR: Parameter 11 was incorrect on entry to cblas_dgemm.
Segmentation fault.

I have put some prints right under the calls to gemm() and done some tests; the first call seems to execute, but not the second one.

I have tried and checked everything, what is wrong?

Thank you in advance!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

野稚 2025-01-21 00:48:25

默认情况下，大多数编译器将整数（C 或 C++ 为“int”/Fortran 为“INTEGER”）作为 32 位长度。因此大多数应用程序需要与 LP64 MKL 库链接。
(https://www.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/top/linking-your-application-with-onemkl/linking-in-detail /linking-with-interface-libraries/using-the-ilp64-interface-vs-lp64-interface.html）

因此尝试链接 LP64 接口并查看它是否有效。
另外，我建议您设置 MKL_VERBOSE=1
（https://www.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/top/managing-output/using-onemkl-verbose-mode.html)
然后运行您的代码，以便您可以看到哪些参数传递给了该函数（如您的错误消息所示）。

您也可以参考oneMKL附带的示例。在您系统中的mkl目录位置下有一个类似的示例，如下所示 \oneAPI\mkl\2022.0.2\examples\examples_dpcpp\dpcpp\blas\source 和 usm_gemm.cpp
我认为应该对您有帮助的文件名。

回复收藏 0 原文

多彩岁月 2025-01-21 00:48:25

我找到了解决方案。我使用的是 gemm 调用的 row_major 版本，并且我必须为此代码调用 column_major 版本，小心！

回复收藏 0 原文

~没有更多了~

关于作者

画尸师

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

英特尔 MKL 错误：调用 gemm() 时参数不正确

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

尘曦

在梵高的星空下

善良天后

韬韬不绝

qq_CgiN62

不美如何

友情链接

英特尔 MKL 错误：调用 gemm() 时参数不正确

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

尘曦

在梵高的星空下

善良天后

韬韬不绝

qq_CgiN62

不美如何

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。