犰狳 C++ ifft 性能不佳
我有当前的测试代码,
#include <iostream>
#define ARMA_DONT_USE_WRAPPER
#include <armadillo>
using namespace std::complex_literals;
int main()
{
arma::cx_mat testMat { };
testMat.set_size(40, 19586);
auto nPositions = static_cast<arma::sword>(floor(19586/2));
arma::cx_rowvec a_vec {19586, arma::fill::randu};
arma::cx_rowvec b_vec {19586, arma::fill::randu};
arma::cx_rowvec c_vec {19586, arma::fill::randu};
for (size_t nCo=0; nCo < 3; nCo++) {
arma::rowvec d {19586, arma::fill::randu};
for(size_t iDop = 0; iDop < 40; ++iDop)
{
arma::cx_rowvec signalFi = (b_vec % arma::exp(-1i*M_PI*a_vec));
testMat.row(iDop) += arma::ifft(arma::shift(arma::fft(signalFi), nPositions).eval() % c_vec).eval();
}
}
return 0;
}
我正在尝试执行一些计算。 每次迭代的秒表共享性能约为:300毫秒,这对我的需求来说性能很差。
有人可以解释我做错了什么或一些技巧如何提高性能。
I used .eval() to perform 'eager' evaluation.
gcc 11.2
armadillo 10.8.2
Release Mode -O3
更新版本。可以重新设计 ifft 函数吗? 测试代码
#include <iostream>
#include <fftw3.h>
#include <armadillo>
#include "StopWatch.h"
using namespace std;
inline arma::cx_mat ifftshift(arma::cx_mat const &axx)
{
return arma::shift(axx, -ceil(axx.n_rows/2), 0);
}
void ifft(arma::cx_mat &inMat, arma::cx_mat &outMat)
{
size_t N = inMat.n_rows;
size_t n_cols = inMat.n_cols;
for (size_t index = 0; index < n_cols; ++index)
{
fftw_complex *in1 = reinterpret_cast<fftw_complex *>(inMat.colptr(index));
fftw_complex *out1 = reinterpret_cast<fftw_complex *>(outMat.colptr(index));
fftw_plan pl_ifft_cx1 = fftw_plan_dft_1d(N, in1, out1, FFTW_BACKWARD, FFTW_ESTIMATE);
fftw_execute_dft(pl_ifft_cx1, in1, out1);
}
outMat /= N;
}
int main()
{
arma::cx_mat B;
B << std::complex<double>(+1.225e-01,+8.247e-01) << std::complex<double>(+4.078e-01,+5.632e-01) << std::complex<double>(+8.866e-01,+8.386e-01) << arma::endr
<< std::complex<double>(+5.958e-01,+1.015e-01) << std::complex<double>(+7.857e-01,+4.267e-01) << std::complex<double>(+7.997e-01,+9.176e-01) << arma::endr
<< std::complex<double>(+1.877e-01,+3.378e-01) << std::complex<double>(+2.921e-01,+9.651e-01) << std::complex<double>(+1.056e-01,+6.901e-01) << arma::endr
<< std::complex<double>(+2.322e-01,+6.990e-01) << std::complex<double>(+1.547e-01,+4.256e-01) << std::complex<double>(+9.094e-01,+1.194e-01) << arma::endr
<< std::complex<double>(+3.917e-01,+3.886e-01) << std::complex<double>(+2.166e-01,+4.962e-01) << std::complex<double>(+9.777e-01,+4.464e-01) << arma::endr;
arma::cx_mat output(5,3);
arma::cx_mat shifted = ifftshift(B);
arma::cx_mat arma_result = arma::ifft(shifted);
B.print("B");
arma_result.print("arma_result");
ifft(shifted, output);
output.print("output");
return 0;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我刚刚用我自己的库尝试了类似的操作,根据我的测量,您认为循环的每次迭代不应超过 1 毫秒(而不是 300 毫秒),这是正确的。
这是等效的代码,抱歉,这不是犰狳答案,我只是指出最小化操作和分配的具体目标是什么。
完整的代码和库位于: https://gitlab.com/correaa/boost-multi/-/blob/master/adaptors/fftw/test/shift.cpp#L45-58(额外的代码用于计时测量)。
同样能说明问题的是,我试图犯所有可能的错误来悲观代码。
为了尝试模仿我认为犰狳所做的“错误”......在循环内分配并始终进行复制。但我得到的是每次迭代需要 1.5 毫秒。
我的结论是,您的犰狳用法或库本身存在严重错误。
I just tried a similar operation with my own library and, according to my measurements, you are correct that each iteration of the loop shouldn't take more than 1 millisecond (instead of 300 ms).
This is the equivalent code, sorry that this is not an Armadillo answer, I am just pointing out what are the concrete goals for minimizing operations and allocations.
The full code and library is here: https://gitlab.com/correaa/boost-multi/-/blob/master/adaptors/fftw/test/shift.cpp#L45-58 (the extra code is for the timing measurement).
What is also telling is that I tried to do all the possible mistakes to pessimize the code.
To try to mimic what I think Armadillo is doing "wrong"... allocating inside the loop and making copies all the time. But what I get is that each iteration take 1.5 milliseconds.
My conclusion is that something is terribly wrong in your Armadillo usage or in the library itself.