英特尔 MKL 错误:调用 gemm() 时参数不正确

发布于 2025-01-14 00:48:24 字数 4168 浏览 1 评论 0原文

我有这样的代码:

void my_function(double *image_vector, double *endmembers, double *abundanceVector, int it, int lines, int samples, int bands, int targets)
{
    double *h_Num;
    double *h_aux;
    double *h_Den;
    int lines_samples = lines*samples;
        
    h_Num = (double*) malloc(lines_samples * targets * sizeof(double));
    h_aux = (double*) malloc(lines_samples * bands * sizeof(double));
    h_Den = (double*) malloc(lines_samples * targets * sizeof(double));

    sycl::queue my_queue{sycl::default_selector{}};

        std::cout << "Device: "
                  << my_queue.get_device().get_info<sycl::info::device::name>()
                  << std::endl;
    
    // USM declaration
    double* image_vector_usm = sycl::malloc_shared<double>(lines_samples*bands, my_queue);
    double* endmembers_usm = sycl::malloc_shared<double>(targets*bands, my_queue);
    double* abundanceVector_usm = sycl::malloc_shared<double>(lines_samples*targets, my_queue); 
    double* h_Num_usm = sycl::malloc_shared<double>(lines_samples*targets, my_queue);
    double* h_aux_usm = sycl::malloc_shared<double>(lines_samples*bands, my_queue);
    double* h_Den_usm = sycl::malloc_shared<double>(lines_samples*targets, my_queue);
    auto nonTrans = oneapi::mkl::transpose::nontrans;
    auto yesTrans = oneapi::mkl::transpose::trans;
    
    int i,j;
    
    // We copy the parameters values into the USM variables // Maybe the mistake is here?
    std::memcpy(image_vector_usm, image_vector,sizeof(double) * lines_samples*bands);
    std::memcpy(endmembers_usm, endmembers,sizeof(double) * targets*bands);
    
    // Initialization
    for(i=0; i<lines_samples*targets; i++)
        abundanceVector_usm[i]=1;

    double alpha = 1.0;
    double beta = 0.0;

    // Start of callings to dgemm()

      oneapi::mkl::blas::row_major::gemm(my_queue, nonTrans, yesTrans, lines_samples, targets, bands, alpha, image_vector_usm,lines_samples, endmembers_usm, targets, beta, h_Num_usm, lines_samples);

    my_queue.wait_and_throw();

    for(i=0; i<it; i++)
    { 
        oneapi::mkl::blas::row_major::gemm(my_queue, nonTrans, nonTrans, lines_samples, targets, bands, alpha, abundanceVector_usm, lines_samples, endmembers_usm, targets, beta, h_aux_usm, lines_samples);
        
        my_queue.wait_and_throw();

        oneapi::mkl::blas::row_major::gemm(my_queue, nonTrans, yesTrans, lines_samples, targets, bands, alpha,h_aux_usm, lines_samples, endmembers_usm, targets, beta, h_Den_usm, lines_samples);

        my_queue.wait_and_throw();

        my_queue.parallel_for(sycl::range<1> (lines_samples*targets), [=] (sycl::id<1> j){
            abundanceVector_usm[j] = abundanceVector_usm[j]*(h_Num_usm[j]/h_Den_usm[j]);
        }).wait();
    }

    free(h_Den);
    free(h_Num);
    free(h_aux);
    
    // Free SYCL
    free(image_vector_usm, my_queue);
    free(endmembers_usm, my_queue);
    free(abundanceVector_usm, my_queue);
    free(h_Num_usm, my_queue);
    free(h_aux_usm, my_queue);
    free(h_Den_usm, my_queue);
}

这是 makefile,我从一个名为“matrix_mul_mkl”的默认 oneMKL 示例借用了它,并将其改编为我的文件名。 makefile 称为 GNUmakefile:

# Makefile for GNU Make

default: run

all: run

run: my_code

MKL_COPTS = -DMKL_ILP64  -I"${MKLROOT}/include"
MKL_LIBS = -L${MKLROOT}/lib/intel64 -lmkl_sycl -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lsycl -lOpenCL -lpthread -lm -ldl

DPCPP_OPTS = $(MKL_COPTS) -fsycl-device-code-split=per_kernel $(MKL_LIBS)

my_code: my_code.cpp RS_algorithm.cpp # This RS file is also needed to compile, nothing strange there I believe, completely sequential and just calls the function in my_code.
    dpcpp $^ -o $@ $(DPCPP_OPTS)


clean:
    -rm -f my_code

.PHONY: clean run all

我知道有时 ILP64 或 LP64 库会出现问题,但上面提到的 matrix_mul 示例可以工作,所以这不对吗?

这就是执行返回的结果:

Device: Intel whatever model...
Intel MKL ERROR: Parameter 11 was incorrect on entry to cblas_dgemm.
Segmentation fault.

我在 gemm() 调用的正下方放置了一些打印内容并做了一些测试;第一个调用似乎执行了,但第二个调用没有执行。

我已经尝试并检查了所有内容,有什么问题吗?

先感谢您!

I have this code:

void my_function(double *image_vector, double *endmembers, double *abundanceVector, int it, int lines, int samples, int bands, int targets)
{
    double *h_Num;
    double *h_aux;
    double *h_Den;
    int lines_samples = lines*samples;
        
    h_Num = (double*) malloc(lines_samples * targets * sizeof(double));
    h_aux = (double*) malloc(lines_samples * bands * sizeof(double));
    h_Den = (double*) malloc(lines_samples * targets * sizeof(double));

    sycl::queue my_queue{sycl::default_selector{}};

        std::cout << "Device: "
                  << my_queue.get_device().get_info<sycl::info::device::name>()
                  << std::endl;
    
    // USM declaration
    double* image_vector_usm = sycl::malloc_shared<double>(lines_samples*bands, my_queue);
    double* endmembers_usm = sycl::malloc_shared<double>(targets*bands, my_queue);
    double* abundanceVector_usm = sycl::malloc_shared<double>(lines_samples*targets, my_queue); 
    double* h_Num_usm = sycl::malloc_shared<double>(lines_samples*targets, my_queue);
    double* h_aux_usm = sycl::malloc_shared<double>(lines_samples*bands, my_queue);
    double* h_Den_usm = sycl::malloc_shared<double>(lines_samples*targets, my_queue);
    auto nonTrans = oneapi::mkl::transpose::nontrans;
    auto yesTrans = oneapi::mkl::transpose::trans;
    
    int i,j;
    
    // We copy the parameters values into the USM variables // Maybe the mistake is here?
    std::memcpy(image_vector_usm, image_vector,sizeof(double) * lines_samples*bands);
    std::memcpy(endmembers_usm, endmembers,sizeof(double) * targets*bands);
    
    // Initialization
    for(i=0; i<lines_samples*targets; i++)
        abundanceVector_usm[i]=1;

    double alpha = 1.0;
    double beta = 0.0;

    // Start of callings to dgemm()

      oneapi::mkl::blas::row_major::gemm(my_queue, nonTrans, yesTrans, lines_samples, targets, bands, alpha, image_vector_usm,lines_samples, endmembers_usm, targets, beta, h_Num_usm, lines_samples);

    my_queue.wait_and_throw();

    for(i=0; i<it; i++)
    { 
        oneapi::mkl::blas::row_major::gemm(my_queue, nonTrans, nonTrans, lines_samples, targets, bands, alpha, abundanceVector_usm, lines_samples, endmembers_usm, targets, beta, h_aux_usm, lines_samples);
        
        my_queue.wait_and_throw();

        oneapi::mkl::blas::row_major::gemm(my_queue, nonTrans, yesTrans, lines_samples, targets, bands, alpha,h_aux_usm, lines_samples, endmembers_usm, targets, beta, h_Den_usm, lines_samples);

        my_queue.wait_and_throw();

        my_queue.parallel_for(sycl::range<1> (lines_samples*targets), [=] (sycl::id<1> j){
            abundanceVector_usm[j] = abundanceVector_usm[j]*(h_Num_usm[j]/h_Den_usm[j]);
        }).wait();
    }

    free(h_Den);
    free(h_Num);
    free(h_aux);
    
    // Free SYCL
    free(image_vector_usm, my_queue);
    free(endmembers_usm, my_queue);
    free(abundanceVector_usm, my_queue);
    free(h_Num_usm, my_queue);
    free(h_aux_usm, my_queue);
    free(h_Den_usm, my_queue);
}

This is the makefile, I've borrowed it from a default oneMKL example called "matrix_mul_mkl" and adapted it to my file name. The makefile is called GNUmakefile:

# Makefile for GNU Make

default: run

all: run

run: my_code

MKL_COPTS = -DMKL_ILP64  -I"${MKLROOT}/include"
MKL_LIBS = -L${MKLROOT}/lib/intel64 -lmkl_sycl -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lsycl -lOpenCL -lpthread -lm -ldl

DPCPP_OPTS = $(MKL_COPTS) -fsycl-device-code-split=per_kernel $(MKL_LIBS)

my_code: my_code.cpp RS_algorithm.cpp # This RS file is also needed to compile, nothing strange there I believe, completely sequential and just calls the function in my_code.
    dpcpp $^ -o $@ $(DPCPP_OPTS)


clean:
    -rm -f my_code

.PHONY: clean run all

I know that sometimes there are troubles with the ILP64 or LP64 libraries, but the matrix_mul example mentioned above works, so that can't be right?

And this is what the execution returns:

Device: Intel whatever model...
Intel MKL ERROR: Parameter 11 was incorrect on entry to cblas_dgemm.
Segmentation fault.

I have put some prints right under the calls to gemm() and done some tests; the first call seems to execute, but not the second one.

I have tried and checked everything, what is wrong?

Thank you in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

野稚 2025-01-21 00:48:25

默认情况下,大多数编译器将整数(C 或 C++ 为“int”/Fortran 为“INTEGER”)作为 32 位长度。因此大多数应用程序需要与 LP64 MKL 库链接。
(https://www.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/top/linking-your-application-with-onemkl/linking-in-detail /linking-with-interface-libraries/using-the-ilp64-interface-vs-lp64-interface.html

因此尝试链接 LP64 接口并查看它是否有效。
另外,我建议您设置 MKL_VERBOSE=1
https://www.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/top/managing-output/using-onemkl-verbose-mode.html)
然后运行您的代码,以便您可以看到哪些参数传递给了该函数(如您的错误消息所示)。

您也可以参考oneMKL附带的示例。在您系统中的mkl目录位置下有一个类似的示例,如下所示 \oneAPI\mkl\2022.0.2\examples\examples_dpcpp\dpcpp\blas\source 和 usm_gemm.cpp
我认为应该对您有帮助的文件名。

By default most of the compilers take integers ( 'int' for C or C++ / 'INTEGER' for Fortran) as 32-bit length. So most applications need to be linked with LP64 MKL libraries.
(https://www.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/top/linking-your-application-with-onemkl/linking-in-detail/linking-with-interface-libraries/using-the-ilp64-interface-vs-lp64-interface.html)

So try linking against LP64 interface and see if it works.
Additionally, I would suggest you set MKL_VERBOSE=1
(https://www.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/top/managing-output/using-onemkl-verbose-mode.html)
and then run your code so that you can see what parameters are passed to the function (as your error message says so).

You can also refer to the examples which comes with oneMKL.There is a similar example under the mkl directory location in your system as below \oneAPI\mkl\2022.0.2\examples\examples_dpcpp\dpcpp\blas\source with usm_gemm.cpp
file name which I presume should help you.

多彩岁月 2025-01-21 00:48:25

我找到了解决方案。我使用的是 gemm 调用的 row_major 版本,并且我必须为此代码调用 column_major 版本,小心!

I found the solution. I was using the row_major version of the gemm call, and I had to call the column_major version for this code, be careful!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文