空间域卷积不等于使用pytorch的频域乘法

发布于 2025-01-26 15:21:44 字数 2922 浏览 2 评论 0原文

我想验证空间域中的2D卷积是否真的是频域中的乘法,因此我使用Pytorch用3×3内核(均为真实)实现图像的卷积。然后,我将图像和内核转换为频域,将它们乘以它们,将结果转换回空间域。这是结果:
当内核是偶数或奇数(即在频域中纯真实或纯虚构的)时,这两个变换的结果似乎很好地匹配。我使用两种结果的最小值和最大评估来评估,因为我不确定某些保证金对准问题是否会影响直接差异。这是均匀和奇数内核的三个运行:

# Even Kernel
min max with s-domain conv: -0.03659552335739136 4.378755569458008
min max with f-domain mul: -0.0365956649184227 4.378755569458008

min max with s-domain conv: -1.2673343420028687 2.397951126098633
min max with f-domain mul: -1.2673344612121582 2.397951126098633

min max with s-domain conv: -8.185677528381348 0.22980886697769165
min max with f-domain mul: -8.185677528381348 0.22980868816375732

# Odd Kernel
min max with s-domain conv: -1.6630988121032715 1.6592578887939453
min max with f-domain mul: -1.663098692893982 1.6592577695846558

min max with s-domain conv: -3.483165979385376 3.4751217365264893
min max with f-domain mul: -3.483165979385376 3.475121259689331

min max with s-domain conv: -1.7972984313964844 1.7931475639343262
min max with f-domain mul: -1.7972984313964844 1.7931475639343262

但是,如果我既不使用偶数或奇数的内核,差异就在另一个层面:

min max with s-domain conv: -2.3028392791748047 1.675748348236084
min max with f-domain mul: -2.5289478302001953 1.4919483661651611

min max with s-domain conv: -1.1227827072143555 3.0336122512817383
min max with f-domain mul: -1.1954418420791626 2.9853036403656006

min max with s-domain conv: -1.6867876052856445 5.575590133666992
min max with f-domain mul: -1.6832940578460693 5.688591957092285

我想知道这是否来自浮点的精度。但是我尝试了Torch的Complex128,这还没有好处。我的实施有问题吗?还是由于数字的计算是不可避免的?

这是我的代码的简化版本,可以产生此结果。

import torch.nn.functional as F
import torch.fft as fft
import torch, cv2

img = cv2.imread('test.png', 0)

x = torch.as_tensor(img).unsqueeze(0)/255
k = torch.randn(1, 1, 3, 3)
for i in range(k.size(0)):
    for j in range(k.size(1)):
        # For even k
        # for p in range(k.size(2)):
            # for q in range(k.size(3)):
                # k[i, j, p, q] = k[i, j, 2-p, 2-q]
        # For odd k
        # for p in range(k.size(2)):
            # k[i, j, p, 0] = -k[i, j, p, 2]
            # k[i, j, p, 1] = 0
        # for q in range(k.size(3)):
            # k[i, j, 0, q] = -k[i, j, 2, q]
            # k[i, j, 1, q] = 0
        pass

### Spatial domain convolution
padx = F.pad(x, [1,1,1,1])
sdc = F.conv2d(padx.unsqueeze(0), k)

### Frequency domain convolution
# Transform input
fdx = fft.rfft2(x)
sdfdx = fft.irfft2(fdx)

# Transform kernel
size_diff = x.size(-1)-k.size(-1)
padk = torch.roll(F.pad(k, [0,size_diff,0,size_diff]), (-1,-1), (-1, -2))
fdk = fft.rfft2(padk)

# Frequency domain multiplication
fdc = fdk * fdx
fdc = fdc.squeeze(0)

# Back to spatial domain
sdfdc = fft.irfft2(fdc)

### Compare
print("min max with s-domain conv:", sdc.min().item(), sdc.max().item())
print("min max with f-domain mul:", sdfdc.min().item(), sdfdc.max().item())

I want to verify if 2D convolution in spatial domain is really a multiplication in frequency domain, so I used pytorch to implement convolution of an image with a 3×3 kernel (both real). Then I transformed both the image and the kernel into frequency domain, multiplied them, transformed the result back to spatial domain. Here's the result:
When the kernal is even or odd (i.e. pure real or pure imaginary in frequency domain), results of the two transforms seem to match well. I use min and max of both results to evaluate because I'm not sure if some margin alignment problems may affect direct difference. Here are three runs of either even and odd kernel:

# Even Kernel
min max with s-domain conv: -0.03659552335739136 4.378755569458008
min max with f-domain mul: -0.0365956649184227 4.378755569458008

min max with s-domain conv: -1.2673343420028687 2.397951126098633
min max with f-domain mul: -1.2673344612121582 2.397951126098633

min max with s-domain conv: -8.185677528381348 0.22980886697769165
min max with f-domain mul: -8.185677528381348 0.22980868816375732

# Odd Kernel
min max with s-domain conv: -1.6630988121032715 1.6592578887939453
min max with f-domain mul: -1.663098692893982 1.6592577695846558

min max with s-domain conv: -3.483165979385376 3.4751217365264893
min max with f-domain mul: -3.483165979385376 3.475121259689331

min max with s-domain conv: -1.7972984313964844 1.7931475639343262
min max with f-domain mul: -1.7972984313964844 1.7931475639343262

But if I use neither even or odd kernel, the difference is just at another level:

min max with s-domain conv: -2.3028392791748047 1.675748348236084
min max with f-domain mul: -2.5289478302001953 1.4919483661651611

min max with s-domain conv: -1.1227827072143555 3.0336122512817383
min max with f-domain mul: -1.1954418420791626 2.9853036403656006

min max with s-domain conv: -1.6867876052856445 5.575590133666992
min max with f-domain mul: -1.6832940578460693 5.688591957092285

I was wondering if this arises from precision in floating point. But I tried torch's complex128, it wasn't any better. Is there something wrong with my implementation? Or it is inevitable due to calculation with complex numbers?

Here's a simplified version of my code that could produce this result.

import torch.nn.functional as F
import torch.fft as fft
import torch, cv2

img = cv2.imread('test.png', 0)

x = torch.as_tensor(img).unsqueeze(0)/255
k = torch.randn(1, 1, 3, 3)
for i in range(k.size(0)):
    for j in range(k.size(1)):
        # For even k
        # for p in range(k.size(2)):
            # for q in range(k.size(3)):
                # k[i, j, p, q] = k[i, j, 2-p, 2-q]
        # For odd k
        # for p in range(k.size(2)):
            # k[i, j, p, 0] = -k[i, j, p, 2]
            # k[i, j, p, 1] = 0
        # for q in range(k.size(3)):
            # k[i, j, 0, q] = -k[i, j, 2, q]
            # k[i, j, 1, q] = 0
        pass

### Spatial domain convolution
padx = F.pad(x, [1,1,1,1])
sdc = F.conv2d(padx.unsqueeze(0), k)

### Frequency domain convolution
# Transform input
fdx = fft.rfft2(x)
sdfdx = fft.irfft2(fdx)

# Transform kernel
size_diff = x.size(-1)-k.size(-1)
padk = torch.roll(F.pad(k, [0,size_diff,0,size_diff]), (-1,-1), (-1, -2))
fdk = fft.rfft2(padk)

# Frequency domain multiplication
fdc = fdk * fdx
fdc = fdc.squeeze(0)

# Back to spatial domain
sdfdc = fft.irfft2(fdc)

### Compare
print("min max with s-domain conv:", sdc.min().item(), sdc.max().item())
print("min max with f-domain mul:", sdfdc.min().item(), sdfdc.max().item())

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

情域 2025-02-02 15:21:44

一个疯狂的猜测,但总比没有猜测好。

在卷动之前,请尝试对过滤器进行过滤。

因为您知道数学,所以在没有过滤器转置的情况下进行卷积确实是一个相关性,因此您确实过滤器转置以使该相关性与实际卷积。

A wild guess, but it is better than no guess.

Try transposing your filter before convolving.

Because you know the math, convolving without filter transpose is really a correlation, so you do filter transpose to make that correlation into the actual convolution.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文