共享内存计算机上的多线程 FFTW 3.1.2

发布于 2024-08-04 12:14:53 字数 776 浏览 20 评论 0原文

我使用 FFTW 3.1.2 和 Fortran 来执行实数到复数以及复数到实数的 FFT。它在一个线程上完美运行。

不幸的是，当我使用多线程 FFTW 时遇到一些问题在 32 CPU 共享内存计算机上。我有两个计划，一个用于 9 实数到复数 FFT，一个用于 9 复数到实数 FFT（大小每个真实字段的大小：512*512）。我使用 Fortran 并编译（使用 ifort）我的链接到以下库的代码：

-lfftw3f_threads -lfftw3f -lm -lguide -lpthread -mp

该程序似乎可以正确编译，并且函数 sfftw_init_threads 返回一个非零整数值，通常为 65527。

但是，即使该程序运行完美，但速度较慢，为 2 或多于一个线程。 top 命令显示奇怪的 CPU 负载大于 100%（并且比 n_threads*100 大得多）。一个htop 命令显示一个处理器（假设是 1 号）正在以程序负载 100%，而所有其他处理器，包括 1 号正在处理同一个程序，负载为 0%，内存为 0%，时间为 0。

如果有人知道这里发生了什么......非常感谢！

原文

I use FFTW 3.1.2 with Fortran to perform real to complex and complex to real FFTs. It works perfectly on one thread.

Unfortunately I have some problems when I use the multi-threaded FFTW
on a 32 CPU shared memory computer. I have two plans,
one for 9 real to complex FFT and one for 9 complex to real FFT (size
of each real field: 512*512). I use Fortran and I compile (using ifort) my
code linking to the following libraries:

-lfftw3f_threads -lfftw3f -lm -lguide -lpthread -mp

The program seems to compile correctly and the function sfftw_init_threads returns a non-zero integer value, usually 65527.

However, even though the program runs perfectly, it is slower with 2
or more threads than with one. A top command shows weird CPU load
larger than 100% (and much more larger than n_threads*100). An htop
command shows that one processor (let's say number 1) is working at a
100% load on the program, while ALL the other processors, including
number 1, are working on this very same program, at a 0% load, 0% memory and 0 TIME.

If anybody has any idea of what's going on here... thanks a lot!

分享到QQ

分享到微博