gprofile中的mkl_vml_serv_threader是否意味着MKL没有顺序运行

发布于 2025-01-30 17:00:04 字数 501 浏览 4 评论 0 原文

我们正在运行一个正在增强MKL Blas的应用程序。我们被告知不要过分线程。

为了在编译过程中不考虑多线程(所谓的并行?)版本,即禁用超线程,但只需要MKL顺序矢量化,我们从Findmkl Cmake文件中删除了螺纹库。该编译器是ICC 2019。

为了在运行时禁用多线程,我们在slurm设置中启动了任务 - slurmfile中的threads-per-core = 1。

但是,我们不确定如何仔细检查MKL仅依次运行,因此我们收集了一个w/ gprof的(在4个内核,单个群集节点)配置文件中。

以下功能出现在平面剖面上,尽管每个功能都消耗少于0.3%。他们是否有证据表明MKL是高线程,即“不在顺序模式下运行”的想法?

mkl_vml_serv_threader_d_2iI_1oI

mkl_vml_serv_threader_d_1i_1o

mkl_vml_serv_threader_d_1iI_1oI

mkl_vml_serv_threader_d_2i_1o

We're running an application that's in the process of being MKL BLAS enhaced. We've been told not to hyperthread.

In order for multithreaded (so-called parallel?) version to not be considered during compilation, i.e. to disable hyperthreading but only wanting MKL sequential vectorization, we removed the threaded library from the FindMKL Cmake file. The compiler was icc 2019.

In order to disable multithreading at runtime we launched the tasks in slurm setting --threads-per-core=1 in the slurmfile.

Yet we are not sure how to double-check that MKL is only running sequentially, so we collected a (summed over 4 cores, single cluster node) profile w/ gprof.

The following functions appear on the flat profile albeit consuming less than 0.3% each. Are they evidence to support the idea that MKL is hyperthreading, i.e. "not running in sequential mode"?

mkl_vml_serv_threader_d_2iI_1oI

mkl_vml_serv_threader_d_1i_1o

mkl_vml_serv_threader_d_1iI_1oI

mkl_vml_serv_threader_d_2i_1o

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

可爱咩 2025-02-06 17:00:04

默认情况下,Intel®OneapiMath内核库使用的OpenMP线程数量等于系统上物理内核的数量,并且它在所有可用的物理内核上运行,直到我们提及以下提到的一些选项为止。

ICC(最新)之类的英特尔编译器具有编译器选项-QMKL = [lib],LIB指示应链接哪些库文件,并且值如下。

并行:

告诉编译器使用onemkl中的螺纹库链接。如果没有LIB指定选项,则这是默认值。

顺序:

告诉编译器使用Onemkl中的顺序库链接。

群集:

告诉编译器使用特定于群集的库和Onemkl中的顺序库链接。

因此,如果要顺序运行,请使用-QMKL =顺序。
由于您使用的是ICC 2019,因此请检查 icc -help 并搜索选项(我猜是-MKL不是-QMKL)。

此外,您还可以使用链接行顾问工具(),可帮助您查看对用例的特定库。

如注释中所述,使用MKL_Verbose = 1有助于获取有关MKL版本的详细信息,MKL调用的参数,该功能所花费的时间,以及指示线程数量的NTHR和其他一些详细信息,您也可以参考给定的给定关联。
例如:mkl_verbose = 1 ./a.out

By default, Intel® oneAPI Math Kernel Library uses the number of OpenMP threads equal to the number of physical cores on the system and it runs on all the available physical cores until and unless we mention some options which are mentioned below.

Intel compilers like icc(latest) have a compiler option -qmkl=[lib] and the lib indicates which library files should be linked and the values are as follows.

parallel:

Tells the compiler to link using the threaded libraries in oneMKL. This is the default if the option is specified with no lib.

sequential:

Tells the compiler to link using the sequential libraries in oneMKL.

cluster:

Tells the compiler to link using the cluster-specific libraries and the sequential libraries in oneMKL.

So if you want to run it sequentially, use -qmkl=sequential.
Since you are using icc 2019, check icc --help and search for the options (i guess it is -mkl not -qmkl).

Additionally, you can also make use of the link line advisor tool (https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html?wapkw=link%20line%20advisor#gs.0myxfc) which helps you to see the required libraries specific to your use case.

As mentioned in the comments, using MKL_VERBOSE=1 helps to get details about version of MKL, parameters to the mkl calls, time taken by the function, also the NThr which indicates number of threads and some other details as well you can refer the given link.
eg: MKL_VERBOSE=1 ./a.out

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文