FFTW 和 CUFFT 输出之间的差异
在我在下面发布的字符中,我比较了 FFTW 和 CUFFT 中运行的 IFFT 的结果。
造成这种结果不同的可能原因是什么?舍入误差真的有那么大吗?
这是相关的代码片段:
cufftHandle plan;
cufftComplex *d_data;
cufftComplex *h_data;
cudaMalloc((void**)&d_data, sizeof(cufftComplex)*W);
complex<float> *temp = (complex<float>*)fftwf_malloc(sizeof(fftwf_complex) * W);
h_data = (cufftComplex *)malloc(sizeof(cufftComplex)*W);
memset(h_data, 0, W*sizeof(cufftComplex));
/* Create a 1D FFT plan. */
cufftPlan1d(&plan, W, CUFFT_C2C, 1);
if (!reader->getData(rowBuff, row))
return 0;
// copy from read buffer to our FFT input buffer
memcpy(indata, rowBuff, fCols * sizeof(complex<float>));
for(int c = 0; c < W; c++)
h_data[c] = make_cuComplex(indata[c].real(), indata[c].imag());
cutilSafeCall(cudaMemcpy(d_data, h_data, W* sizeof(cufftComplex), cudaMemcpyHostToDevice));
cufftExecC2C(plan, d_data, d_data, CUFFT_INVERSE);
cutilSafeCall(cudaMemcpy(h_data, d_data,W * sizeof(cufftComplex), cudaMemcpyDeviceToHost));
for(int c = 0; c < W; c++)
temp[c] =(cuCrealf(h_data[c]), cuCimagf(h_data[c]));
//execute ifft plan on "indata"
fftwf_execute(ifft);
...
//dump out abs() values of the first 50 temp and outdata values. Had to convert h_data back to a normal complex
ifft 的定义如下:
ifft = fftwf_plan_dft_1d(freqCols, reinterpret_cast<fftwf_complex*>(indata),
reinterpret_cast<fftwf_complex*>(outdata),
FFTW_BACKWARD, FFTW_ESTIMATE);
为了生成图表,我在 fftw_execute 之后转储了 h_data 和 outdata W 是我正在处理的图像的行的宽度。
看到什么明显的东西了吗?
In the char I have posted below, I am comparing the results from an IFFT run in FFTW and CUFFT.
What are the possible reasons this is coming out different? Is it really THAT much round off error?
Here is the relevant code snippet:
cufftHandle plan;
cufftComplex *d_data;
cufftComplex *h_data;
cudaMalloc((void**)&d_data, sizeof(cufftComplex)*W);
complex<float> *temp = (complex<float>*)fftwf_malloc(sizeof(fftwf_complex) * W);
h_data = (cufftComplex *)malloc(sizeof(cufftComplex)*W);
memset(h_data, 0, W*sizeof(cufftComplex));
/* Create a 1D FFT plan. */
cufftPlan1d(&plan, W, CUFFT_C2C, 1);
if (!reader->getData(rowBuff, row))
return 0;
// copy from read buffer to our FFT input buffer
memcpy(indata, rowBuff, fCols * sizeof(complex<float>));
for(int c = 0; c < W; c++)
h_data[c] = make_cuComplex(indata[c].real(), indata[c].imag());
cutilSafeCall(cudaMemcpy(d_data, h_data, W* sizeof(cufftComplex), cudaMemcpyHostToDevice));
cufftExecC2C(plan, d_data, d_data, CUFFT_INVERSE);
cutilSafeCall(cudaMemcpy(h_data, d_data,W * sizeof(cufftComplex), cudaMemcpyDeviceToHost));
for(int c = 0; c < W; c++)
temp[c] =(cuCrealf(h_data[c]), cuCimagf(h_data[c]));
//execute ifft plan on "indata"
fftwf_execute(ifft);
...
//dump out abs() values of the first 50 temp and outdata values. Had to convert h_data back to a normal complex
ifft was defined like so:
ifft = fftwf_plan_dft_1d(freqCols, reinterpret_cast<fftwf_complex*>(indata),
reinterpret_cast<fftwf_complex*>(outdata),
FFTW_BACKWARD, FFTW_ESTIMATE);
and to generate the graph I dumped out h_data and outdata after the fftw_execute
W is the width of the row of the image I am processing.
See anything glaringly obvious?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
所以看起来 CUFFT 返回的是实部和虚部,而 FFTW 只返回实部。当我拥有复合体的两个部分时,CUFFT 复合体库附带的 cuCabsf() 函数会导致它给我 sqrt(2) 的倍数
顺便说一句 - 我从来没有能够在中间步骤中获得完全匹配的结果FFTW 和 CUFFT 之间。如果你同时进行 IFFT 和 FFT,你应该得到一些接近的结果。
So it looks like CUFFT is returning a real and imaginary part, and FFTW only the real. The cuCabsf() function that comes iwth the CUFFT complex library causes this to give me a multiple of sqrt(2) when I have both parts of the complex
As an aside - I never have been able to get exactly matching results in the intermediate steps between FFTW and CUFFT. If you do both the IFFT and FFT though, you should get something close.