pytorch中图像的梯度 - 用于wgan中的梯度惩罚计算
我正在关注这个Github Repo 用于 WGAN 实施梯度惩罚。
我试图理解以下方法,它负责对梯度惩罚计算进行单元测试。
def test_gradient_penalty(image_shape):
bad_gradient = torch.zeros(*image_shape)
bad_gradient_penalty = gradient_penalty(bad_gradient)
assert torch.isclose(bad_gradient_penalty, torch.tensor(1.))
image_size = torch.prod(torch.Tensor(image_shape[1:]))
good_gradient = torch.ones(*image_shape) / torch.sqrt(image_size)
good_gradient_penalty = gradient_penalty(good_gradient)
assert torch.isclose(good_gradient_penalty, torch.tensor(0.))
random_gradient = test_get_gradient(image_shape)
random_gradient_penalty = gradient_penalty(random_gradient)
assert torch.abs(random_gradient_penalty - 1) < 0.1
# Now pass tuple argument for image dimenstion of
# (batch_size, channel, height, width)
test_gradient_penalty((256, 1, 28, 28))
我不明白下面的行
good_gradient = torch.ones(*image_shape) / torch.sqrt(image_size)
在上面的 torch.ones(*image_shape)
只是填充 4-D 张量 填充 1 然后 torch.sqrt(image_size)
只是代表 tensor(28.)
的值
所以,我试图理解为什么我需要将 4-D 张量除以tensor(28.)
获取 good_gradient
如果我打印 bad_gradient,它将是一个 4-D 张量,如下所示
tensor([[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]],
---
---
如果我打印good_gradient
,输出将为
tensor([[[[0.0357, 0.0357, 0.0357, ..., 0.0357, 0.0357, 0.0357],
[0.0357, 0.0357, 0.0357, ..., 0.0357, 0.0357, 0.0357],
[0.0357, 0.0357, 0.0357, ..., 0.0357, 0.0357, 0.0357],
...,
[0.0357, 0.0357, 0.0357, ..., 0.0357, 0.0357, 0.0357],
[0.0357, 0.0357, 0.0357, ..., 0.0357, 0.0357, 0.0357],
[0.0357, 0.0357, 0.0357, ..., 0.0357, 0.0357, 0.0357]]],
---
---
I am following this Github Repo for the WGAN implementation with Gradient Penalty.
And I am trying to understand the following method, which does the job of unit-testing the gradient-penalty calulations.
def test_gradient_penalty(image_shape):
bad_gradient = torch.zeros(*image_shape)
bad_gradient_penalty = gradient_penalty(bad_gradient)
assert torch.isclose(bad_gradient_penalty, torch.tensor(1.))
image_size = torch.prod(torch.Tensor(image_shape[1:]))
good_gradient = torch.ones(*image_shape) / torch.sqrt(image_size)
good_gradient_penalty = gradient_penalty(good_gradient)
assert torch.isclose(good_gradient_penalty, torch.tensor(0.))
random_gradient = test_get_gradient(image_shape)
random_gradient_penalty = gradient_penalty(random_gradient)
assert torch.abs(random_gradient_penalty - 1) < 0.1
# Now pass tuple argument for image dimenstion of
# (batch_size, channel, height, width)
test_gradient_penalty((256, 1, 28, 28))
I don't understand the below line
good_gradient = torch.ones(*image_shape) / torch.sqrt(image_size)
In above the torch.ones(*image_shape)
is just filling a 4-D Tensor filled up with 1 and thentorch.sqrt(image_size)
is just representing the value of tensor(28.)
So, what I am trying to understand why I need to divide the 4-D Tensor by tensor(28.)
to get the good_gradient
If I print bad_gradient, it will be a 4-D Tensor as below
tensor([[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]],
---
---
If I print good_gradient
, the output will be
tensor([[[[0.0357, 0.0357, 0.0357, ..., 0.0357, 0.0357, 0.0357],
[0.0357, 0.0357, 0.0357, ..., 0.0357, 0.0357, 0.0357],
[0.0357, 0.0357, 0.0357, ..., 0.0357, 0.0357, 0.0357],
...,
[0.0357, 0.0357, 0.0357, ..., 0.0357, 0.0357, 0.0357],
[0.0357, 0.0357, 0.0357, ..., 0.0357, 0.0357, 0.0357],
[0.0357, 0.0357, 0.0357, ..., 0.0357, 0.0357, 0.0357]]],
---
---
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
对于行
good_gradient = torch.ones(*image_shape) / torch.sqrt(image_size)
首先,请注意 WGAN 中的梯度惩罚项是 =>
(norm(gradient(interpolated)) - 1)^2
对于理想梯度(即良好的梯度),此惩罚项将为 0。即良好的梯度是其gradient_penalty 为尽可能接近 0
这意味着在考虑梯度的 L2-范数后,应满足以下条件
(norm(gradient(x')) -1)^2 = 0
ie
norm(gradient(x')) = 1
ie
sqrt(Sum(gradient_i^2) ) = 1
现在如果你继续简化上面的(考虑如何计算
norm
,请参阅下面的注释)数学表达式,您最终会得到good_gradient = torch.ones(*image_shape) / torch.sqrt(image_size)
由于您将
image_shape
传递为 (256, 1, 28, 28) - 所以torch.sqrt(image_size)
在您的case 是tensor(28.)实际上,上面的代码行使用缩放器来划分 4-D 张量的每个元素,如 [[[[1., 1. ... ]]]] tensor(28.)
另外,请注意如何计算
norm
torch.norm
无需额外参数即可执行,即所谓的 Frobenius 范数有效地将矩阵重塑为一个长向量并返回其 2-范数。给定一个 M * N 矩阵,矩阵的 Frobenius 范数定义为矩阵元素平方和的平方根。
For the line
good_gradient = torch.ones(*image_shape) / torch.sqrt(image_size)
First, note the Gradient Penalty term in WGAN is =>
(norm(gradient(interpolated)) - 1)^2
And for the Ideal Gradient (i.e. a Good Gradient), this Penalty term would be 0. i.e. A Good gradient is one which has its gradient_penalty is as close to 0 as possible
This means the following should satisfy, after considering the L2-Norm of the Gradient
(norm(gradient(x')) -1)^2 = 0
i.e
norm(gradient(x')) = 1
i.e.
sqrt(Sum(gradient_i^2) ) = 1
Now if you just continue simplifying the above (considering how
norm
is calculated, see my note below) math expression, you will end up withgood_gradient = torch.ones(*image_shape) / torch.sqrt(image_size)
Since you are passing the
image_shape
as (256, 1, 28, 28) - sotorch.sqrt(image_size)
in your case is tensor(28.)Effectively the above line is dividing each element of A 4-D Tensor like [[[[1., 1. ... ]]]] with a scaler tensor(28.)
Separately, note how
norm
is calculatedtorch.norm
without extra arguments performs, what is called a Frobenius norm which is effectively reshaping the matrix into one long vector and returning the 2-norm of that.Given an M * N matrix, The Frobenius Norm of a matrix is defined as the square root of the sum of the squares of the elements of the matrix.