当前位置：文江博客话题详情

FFT 卷积 - 非常低的 PSNR

发布于 2024-11-30 20:30:32 字数 6400 浏览 6 评论 0原文

我正在使用 FFT 滤波器（kernelsize=10）对图像（512*512）进行卷积，看起来不错。

但是当我将它与我以正常方式进行复杂处理的图像进行比较时，结果很糟糕。
PSNR 约为 35。67,187

/262,144 像素值相差 1 或更多（峰值约为 8）（最大像素值为 255）。

我的问题是，在频率空间中进行卷积时这是正常的还是我的卷积/变换函数可能存在问题？。因为奇怪的是，当使用 double 作为数据类型时，我应该得到更好的结果。但它保持完全一样。

当我将图像变换到频率空间时，不要对其进行卷积，然后将其变换回来就可以了，并且使用浮点时 PSNR 约为 140。

另外，由于像素差异仅为 1-10，我认为我可以排除缩放错误

编辑：为无聊感兴趣的人提供更多详细

信息使用开源 KissFFT 库。使用真正的二维输入（kiss_fftndr.h）

我的图像数据类型是 PixelMatrix。只是一个包含 alpha、红色、绿色和蓝色值从 0.0 到 1.0 float 的矩阵

我的内核也是一个 PixelMatrix。

以下是卷积函数的一些片段

使用的数据类型：

#define kiss_fft_scalar float
#define kiss_fft_cpx struct {
    kiss_fft_scalar r;
    kiss_fft_scalar i,
}

FFT 的配置：

//parameters to kiss_fftndr_alloc:
//1st param = array with the size of the 2 dimensions (in my case dim={width, height})
//2nd param = count of the dimensions (in my case 2)
//3rd param = 0 or 1 (forward or inverse FFT)
//4th and 5th params are not relevant

kiss_fftndr_cfg stf = kiss_fftndr_alloc(dim, 2, 0, 0, 0);
kiss_fftndr_cfg sti = kiss_fftndr_alloc(dim, 2, 1, 0, 0);

填充和转换内核：

I make a new array:

kiss_fft_scalar kernel[width*height];

I fill it with 0 in a loop.

Then I fill the middle of this array with the kernel I want to use.
So if I would use a 2*2 kernel with values 1/4, 1/4, 1/4 and 1/4 it would look like

0 0 0 0 0 0
0 1/4 1/4 0
0 1/4 1/4 0
0 0 0 0 0 0

The zeros are padded until they reach the size of the image.

Then I swap the quadrants of the image diagonally. It looks like:

1/4 0 0 1/4
 0  0 0  0
 0  0 0  0
1/4 0 0 1/4

now I transform it: kiss_fftndr(stf, floatKernel, outkernel);

outkernel is declarated as 
kiss_fft_cpx outkernel= new kiss_fft_cpx[width*height]

将颜色放入数组中：

kiss_fft_scalar *red = new kiss_fft_scalar[width*height];
kiss_fft_scalar *green = new kiss_fft_scalar[width*height];
kiss_fft-scalar *blue = new kiss_fft_scalar[width*height];

for(int i=0; i<height; i++) {
 for(int j=0; i<width; j++) {
  red[i*height+j] = input.get(j,i).getRed();  //input is the input image pixel matrix
  green[i*height+j] = input.get(j,i).getGreen();
  blue{i*height+j] = input.get(j,i).getBlue();
 }
}

Then I transform the arrays:

kiss_fftndr(stf, red, outred);
kiss_fftndr(stf, green, outgreen);
kiss_fftndr(stf, blue, outblue);      //the out-arrays are type kiss_fft_cpx*

卷积：

我们现在拥有的：

3 个来自 Kiss_fft_cpx* 类型的转换后的颜色数组
1 个来自 Kiss_fft_cpx 类型的转换后的内核数组*

它们都是复杂的数组

现在是卷积：

for(int m=0; m<til; m++) {
 for(int n=0; n<til; n++) {
  kiss_fft_scalar real = outcolor[m*til+n].r;      //I do that for all 3 arrys in my code!
  kiss_fft_scalar imag = outcolor[m*til+n].i;      //so I have realred, realgreen, realblue
  kiss_fft_scalar realMask = outkernel[m*til+n].r; // and imagred, imaggreen, etc.
  kiss_fft_scalar imagMask = outkernel[m*til+n].i;

  outcolor[m*til+n].r = real * realMask - imag * imagMask; //Same thing here in my code i
  outcolor[m*til+n].i = real * imagMask + imag * realMask; //do it with all 3 colors
 }
}

现在我将它们转换回来：

kiss_fftndri(sti, outred, red);
kiss_fftndri(sti, outgreen, green);
kiss_fftndri(sti, outblue, blue);

and I create a new Pixel Matrix with the values from the color-arrays

PixelMatrix output;

for(int i=0; i<height; i++) {
 for(int j=0; j<width; j++) {
  Pixel p = new Pixel();
  p.setRed( red[i*height+j] / (width*height) ); //I divide through (width*height) because of the scaling happening in the FFT;
  p.setGreen( green[i*height+j] );
  p.setBlue( blue[i*height+j] );
  output.set(j , i , p);
 }
}

注释：

我已经提前注意图像的大小为2 的幂 (256*256)、(512*512) 等

示例：

kernelsize: 10

输入：

输出：

普通卷积的输出：

我的控制台说：

142519 out of 262144 Pixels have a difference of 1 or more (maxRGB = 255)

PSNR: 32.006027221679688
MSE: 44.116752624511719

虽然对我来说它们看起来是一样的 °.°

也许有人很无聊并遍历代码。这并不紧急，但这是一个问题，我只是想知道我到底做错了什么^^

最后，但并非最不重要的是，我的 PSNR 功能，尽管我真的不认为这就是问题：D

void calculateThePSNR(const PixelMatrix first, const PixelMatrix second, float* avgpsnr, float* avgmse) {

int height = first.getHeight();
int width = first.getWidth();

BMP firstOutput;
BMP secondOutput;

firstOutput.SetSize(width, height);
secondOutput.SetSize(width, height);

double rsum=0.0, gsum=0.0, bsum=0.0;
int count = 0;
int total = 0;
for(int i=0; i<height; i++) {
    for(int j=0; j<width; j++) {
        Pixel pixOne = first.get(j,i);
        Pixel pixTwo = second.get(j,i);

        double redOne = pixOne.getRed()*255;
        double greenOne = pixOne.getGreen()*255;
        double blueOne = pixOne.getBlue()*255;

        double redTwo = pixTwo.getRed()*255;
        double greenTwo = pixTwo.getGreen()*255;
        double blueTwo = pixTwo.getBlue()*255;

        firstOutput(j,i)->Red = redOne;
        firstOutput(j,i)->Green = greenOne;
        firstOutput(j,i)->Blue = blueOne;

        secondOutput(j,i)->Red = redTwo;
        secondOutput(j,i)->Green = greenTwo;
        secondOutput(j,i)->Blue = blueTwo;

        if((redOne-redTwo) > 1.0 || (redOne-redTwo) < -1.0) {
            count++;
        }
        total++;

        rsum += (redOne - redTwo) * (redOne - redTwo);
        gsum += (greenOne - greenTwo) * (greenOne - greenTwo);
        bsum += (blueOne - blueTwo) * (blueOne - blueTwo);

    }
}
fprintf(stderr, "%d out of %d Pixels have a difference of 1 or more (maxRGB = 255)", count, total);
double rmse = rsum/(height*width);
double gmse = gsum/(height*width);
double bmse = bsum/(height*width);

double rpsnr = 20 * log10(255/sqrt(rmse));
double gpsnr = 20 * log10(255/sqrt(gmse));
double bpsnr = 20 * log10(255/sqrt(bmse));

firstOutput.WriteToFile("test.bmp");
secondOutput.WriteToFile("test2.bmp");

system("display test.bmp");
system("display test2.bmp");

*avgmse = (rmse + gmse + bmse)/3;
*avgpsnr = (rpsnr + gpsnr + bpsnr)/3;
}

原文

I'm convoluting an image (512*512) with a FFT filter (kernelsize=10), it looks good.

But when I compare it with an image which I convoluted the normal way the result was horrible.
The PSNR is about 35.

67,187/262,144 Pixel values have a difference of 1 or more(peak at ~8) (having a max pixel value of 255).

My question is, is it normal when convoluting in frequency space or might there be a problem with my convolution/transforming functions? . Because the strange thing is that I should get better results when using double as data-type. But it stays COMPLETELY the same.

When I transform an image into frequency space, DON'T convolute it, then transform it back it's fine and the PSNR is about 140 when using float.

Also, due to the pixel differences being only 1-10 I think I can rule out scaling errors

EDIT: More Details for ~~bored~~ interested people

I use the open source kissFFT library. With real 2dimensional input (kiss_fftndr.h)

My Image Datatype is PixelMatrix. Simply a matrix with alpha, red, green and blue values from 0.0 to 1.0 float

My kernel is also a PixelMatrix.

Here some snippets from the Convolution function

Used datatypes:

#define kiss_fft_scalar float
#define kiss_fft_cpx struct {
    kiss_fft_scalar r;
    kiss_fft_scalar i,
}

Configuration of the FFT:

//parameters to kiss_fftndr_alloc:
//1st param = array with the size of the 2 dimensions (in my case dim={width, height})
//2nd param = count of the dimensions (in my case 2)
//3rd param = 0 or 1 (forward or inverse FFT)
//4th and 5th params are not relevant

kiss_fftndr_cfg stf = kiss_fftndr_alloc(dim, 2, 0, 0, 0);
kiss_fftndr_cfg sti = kiss_fftndr_alloc(dim, 2, 1, 0, 0);

Padding and transforming the kernel:

I make a new array:

kiss_fft_scalar kernel[width*height];

I fill it with 0 in a loop.

Then I fill the middle of this array with the kernel I want to use.
So if I would use a 2*2 kernel with values 1/4, 1/4, 1/4 and 1/4 it would look like

0 0 0 0 0 0
0 1/4 1/4 0
0 1/4 1/4 0
0 0 0 0 0 0

The zeros are padded until they reach the size of the image.

Then I swap the quadrants of the image diagonally. It looks like:

1/4 0 0 1/4
 0  0 0  0
 0  0 0  0
1/4 0 0 1/4

now I transform it: kiss_fftndr(stf, floatKernel, outkernel);

outkernel is declarated as 
kiss_fft_cpx outkernel= new kiss_fft_cpx[width*height]

Getting the colors into arrays:

kiss_fft_scalar *red = new kiss_fft_scalar[width*height];
kiss_fft_scalar *green = new kiss_fft_scalar[width*height];
kiss_fft-scalar *blue = new kiss_fft_scalar[width*height];

for(int i=0; i<height; i++) {
 for(int j=0; i<width; j++) {
  red[i*height+j] = input.get(j,i).getRed();  //input is the input image pixel matrix
  green[i*height+j] = input.get(j,i).getGreen();
  blue{i*height+j] = input.get(j,i).getBlue();
 }
}

Then I transform the arrays:

kiss_fftndr(stf, red, outred);
kiss_fftndr(stf, green, outgreen);
kiss_fftndr(stf, blue, outblue);      //the out-arrays are type kiss_fft_cpx*

The convolution:

What we have now:

3 transformed color arrays from type kiss_fft_cpx*
1 transformed kernel array from type kiss_fft_cpx*

They are both complex arrays

Now comes the convolution:

for(int m=0; m<til; m++) {
 for(int n=0; n<til; n++) {
  kiss_fft_scalar real = outcolor[m*til+n].r;      //I do that for all 3 arrys in my code!
  kiss_fft_scalar imag = outcolor[m*til+n].i;      //so I have realred, realgreen, realblue
  kiss_fft_scalar realMask = outkernel[m*til+n].r; // and imagred, imaggreen, etc.
  kiss_fft_scalar imagMask = outkernel[m*til+n].i;

  outcolor[m*til+n].r = real * realMask - imag * imagMask; //Same thing here in my code i
  outcolor[m*til+n].i = real * imagMask + imag * realMask; //do it with all 3 colors
 }
}

Now I transform them back:

kiss_fftndri(sti, outred, red);
kiss_fftndri(sti, outgreen, green);
kiss_fftndri(sti, outblue, blue);

and I create a new Pixel Matrix with the values from the color-arrays

PixelMatrix output;

for(int i=0; i<height; i++) {
 for(int j=0; j<width; j++) {
  Pixel p = new Pixel();
  p.setRed( red[i*height+j] / (width*height) ); //I divide through (width*height) because of the scaling happening in the FFT;
  p.setGreen( green[i*height+j] );
  p.setBlue( blue[i*height+j] );
  output.set(j , i , p);
 }
}

Notes:

I already take care in advance that the image has a size with a power of 2 (256*256), (512*512) etc.

Examples:

kernelsize: 10

Input:

Output:

Output from normal convolution:

my console says :

142519 out of 262144 Pixels have a difference of 1 or more (maxRGB = 255)

PSNR: 32.006027221679688
MSE: 44.116752624511719

though for my eyes they look the same °.°

Maybe one person is bored and goes through the code. It's not urgent, but it's a kind of problem I just want to know what the hell I did wrong ^^

Last, but not least, my PSNR function, though I don't really think that's the problem :D

void calculateThePSNR(const PixelMatrix first, const PixelMatrix second, float* avgpsnr, float* avgmse) {

int height = first.getHeight();
int width = first.getWidth();

BMP firstOutput;
BMP secondOutput;

firstOutput.SetSize(width, height);
secondOutput.SetSize(width, height);

double rsum=0.0, gsum=0.0, bsum=0.0;
int count = 0;
int total = 0;
for(int i=0; i<height; i++) {
    for(int j=0; j<width; j++) {
        Pixel pixOne = first.get(j,i);
        Pixel pixTwo = second.get(j,i);

        double redOne = pixOne.getRed()*255;
        double greenOne = pixOne.getGreen()*255;
        double blueOne = pixOne.getBlue()*255;

        double redTwo = pixTwo.getRed()*255;
        double greenTwo = pixTwo.getGreen()*255;
        double blueTwo = pixTwo.getBlue()*255;

        firstOutput(j,i)->Red = redOne;
        firstOutput(j,i)->Green = greenOne;
        firstOutput(j,i)->Blue = blueOne;

        secondOutput(j,i)->Red = redTwo;
        secondOutput(j,i)->Green = greenTwo;
        secondOutput(j,i)->Blue = blueTwo;

        if((redOne-redTwo) > 1.0 || (redOne-redTwo) < -1.0) {
            count++;
        }
        total++;

        rsum += (redOne - redTwo) * (redOne - redTwo);
        gsum += (greenOne - greenTwo) * (greenOne - greenTwo);
        bsum += (blueOne - blueTwo) * (blueOne - blueTwo);

    }
}
fprintf(stderr, "%d out of %d Pixels have a difference of 1 or more (maxRGB = 255)", count, total);
double rmse = rsum/(height*width);
double gmse = gsum/(height*width);
double bmse = bsum/(height*width);

double rpsnr = 20 * log10(255/sqrt(rmse));
double gpsnr = 20 * log10(255/sqrt(gmse));
double bpsnr = 20 * log10(255/sqrt(bmse));

firstOutput.WriteToFile("test.bmp");
secondOutput.WriteToFile("test2.bmp");

system("display test.bmp");
system("display test2.bmp");

*avgmse = (rmse + gmse + bmse)/3;
*avgpsnr = (rpsnr + gpsnr + bpsnr)/3;
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

徒留西风 2024-12-07 20:30:32

声子的想法是正确的。你的形象发生了变化。如果将图像移动 (1,1)，则 MSE 将大约为零（前提是您相应地遮罩或裁剪图像）。我使用下面的代码（Python + OpenCV）确认了这一点。

import cv
import sys
import math

def main():
    fname1, fname2 = sys.argv[1:]
    im1 = cv.LoadImage(fname1)
    im2 = cv.LoadImage(fname2)

    tmp = cv.CreateImage(cv.GetSize(im1), cv.IPL_DEPTH_8U, im1.nChannels)
    cv.AbsDiff(im1, im2, tmp)
    cv.Mul(tmp, tmp, tmp)
    mse = cv.Avg(tmp)
    print 'MSE:', mse

    psnr = [ 10*math.log(255**2/m, 10) for m in mse[:-1] ]
    print 'PSNR:', psnr

if __name__ == '__main__':
    main()

输出：

MSE: (0.027584912741602553, 0.026742391458366047, 0.028147870144492403, 0.0)
PSNR: [63.724087463606452, 63.858801190963192, 63.636348220531396]

Phonon had the right idea. Your images are shifted. If you shift your image by (1,1), then the MSE will be approximately zero (provided that you mask or crop the images accordingly). I confirmed this using the code (Python + OpenCV) below.

import cv
import sys
import math

def main():
    fname1, fname2 = sys.argv[1:]
    im1 = cv.LoadImage(fname1)
    im2 = cv.LoadImage(fname2)

    tmp = cv.CreateImage(cv.GetSize(im1), cv.IPL_DEPTH_8U, im1.nChannels)
    cv.AbsDiff(im1, im2, tmp)
    cv.Mul(tmp, tmp, tmp)
    mse = cv.Avg(tmp)
    print 'MSE:', mse

    psnr = [ 10*math.log(255**2/m, 10) for m in mse[:-1] ]
    print 'PSNR:', psnr

if __name__ == '__main__':
    main()

Output:

MSE: (0.027584912741602553, 0.026742391458366047, 0.028147870144492403, 0.0)
PSNR: [63.724087463606452, 63.858801190963192, 63.636348220531396]

回复收藏 0 原文

花想c 2024-12-07 20:30:32

我建议您尝试实现以下代码：

A=double(inputS(1:10:length(inputS))); %segmentation 
A(:)=-A(:);
%process the image or signal by fast fourior transformation and inverse fft
fresult=fft(inputS);
fresult(1:round(length(inputS)*2/fs))=0;
fresult(end-round(length(fresult)*2/fs):end)=0;
Y=real(ifft(fresult));

该代码可以帮助您获得相同大小的图像，并且有利于去除 DC 分量，您可以进行卷积。

My advice for you to try to implement the following code :

A=double(inputS(1:10:length(inputS))); %segmentation 
A(:)=-A(:);
%process the image or signal by fast fourior transformation and inverse fft
fresult=fft(inputS);
fresult(1:round(length(inputS)*2/fs))=0;
fresult(end-round(length(fresult)*2/fs):end)=0;
Y=real(ifft(fresult));

that's code help you to obtain the same size image and good for remove DC component ,the you can to convolution.

回复收藏 0 原文

~没有更多了~