FFTW 和 CUFFT 输出之间的差异

发布于 2024-10-21 05:17:14 字数 1710 浏览 9 评论 0原文

在我在下面发布的字符中，我比较了 FFTW 和 CUFFT 中运行的 IFFT 的结果。

造成这种结果不同的可能原因是什么？舍入误差真的有那么大吗？

这是相关的代码片段：

cufftHandle plan;
cufftComplex *d_data;
cufftComplex *h_data;
cudaMalloc((void**)&d_data, sizeof(cufftComplex)*W);

complex<float> *temp = (complex<float>*)fftwf_malloc(sizeof(fftwf_complex) * W);
h_data = (cufftComplex *)malloc(sizeof(cufftComplex)*W);
memset(h_data, 0, W*sizeof(cufftComplex));

/* Create a 1D FFT plan. */
cufftPlan1d(&plan, W, CUFFT_C2C, 1);

if (!reader->getData(rowBuff, row))    
    return 0;

// copy from read buffer to our FFT input buffer    
memcpy(indata, rowBuff, fCols * sizeof(complex<float>));

for(int c = 0; c < W; c++)
    h_data[c] = make_cuComplex(indata[c].real(), indata[c].imag());

cutilSafeCall(cudaMemcpy(d_data, h_data, W* sizeof(cufftComplex), cudaMemcpyHostToDevice));
cufftExecC2C(plan, d_data, d_data, CUFFT_INVERSE);
cutilSafeCall(cudaMemcpy(h_data, d_data,W * sizeof(cufftComplex), cudaMemcpyDeviceToHost));

for(int c = 0; c < W; c++)
    temp[c] =(cuCrealf(h_data[c]), cuCimagf(h_data[c]));

//execute ifft plan on "indata"
fftwf_execute(ifft);
 ...
 //dump out abs() values of the first 50 temp and outdata values. Had to convert h_data back to a normal complex

ifft 的定义如下：

ifft = fftwf_plan_dft_1d(freqCols, reinterpret_cast<fftwf_complex*>(indata),
                         reinterpret_cast<fftwf_complex*>(outdata), 
                         FFTW_BACKWARD, FFTW_ESTIMATE);

为了生成图表，我在 fftw_execute 之后转储了 h_data 和 outdata W 是我正在处理的图像的行的宽度。

看到什么明显的东西了吗？

在此处输入图像描述

原文

In the char I have posted below, I am comparing the results from an IFFT run in FFTW and CUFFT.

What are the possible reasons this is coming out different? Is it really THAT much round off error?

Here is the relevant code snippet:

cufftHandle plan;
cufftComplex *d_data;
cufftComplex *h_data;
cudaMalloc((void**)&d_data, sizeof(cufftComplex)*W);

complex<float> *temp = (complex<float>*)fftwf_malloc(sizeof(fftwf_complex) * W);
h_data = (cufftComplex *)malloc(sizeof(cufftComplex)*W);
memset(h_data, 0, W*sizeof(cufftComplex));

/* Create a 1D FFT plan. */
cufftPlan1d(&plan, W, CUFFT_C2C, 1);

if (!reader->getData(rowBuff, row))    
    return 0;

// copy from read buffer to our FFT input buffer    
memcpy(indata, rowBuff, fCols * sizeof(complex<float>));

for(int c = 0; c < W; c++)
    h_data[c] = make_cuComplex(indata[c].real(), indata[c].imag());

cutilSafeCall(cudaMemcpy(d_data, h_data, W* sizeof(cufftComplex), cudaMemcpyHostToDevice));
cufftExecC2C(plan, d_data, d_data, CUFFT_INVERSE);
cutilSafeCall(cudaMemcpy(h_data, d_data,W * sizeof(cufftComplex), cudaMemcpyDeviceToHost));

for(int c = 0; c < W; c++)
    temp[c] =(cuCrealf(h_data[c]), cuCimagf(h_data[c]));

//execute ifft plan on "indata"
fftwf_execute(ifft);
 ...
 //dump out abs() values of the first 50 temp and outdata values. Had to convert h_data back to a normal complex

ifft was defined like so:

ifft = fftwf_plan_dft_1d(freqCols, reinterpret_cast<fftwf_complex*>(indata),
                         reinterpret_cast<fftwf_complex*>(outdata), 
                         FFTW_BACKWARD, FFTW_ESTIMATE);

and to generate the graph I dumped out h_data and outdata after the fftw_execute
W is the width of the row of the image I am processing.

See anything glaringly obvious?

enter image description here

分享到QQ

分享到微博