Pytorch中的梯度下降重新分配
我正在YouTube上介绍了一系列有关深度学习的教程,我遇到了一个确实使我感到困惑的问题。
X = torch.tensor([1,2,3,4], dtype = torch.float32)
Y = torch.tensor([2,4,6,8], dtype = torch.float32)
w = torch.tensor(0.0, dtype = torch.float32, requires_grad=True)
def forward(x):
return w*x;
def loss(y, y_predicted):
return ((y-y_predicted)**2).mean()
print(f'Prediction before training: f(5) = {forward(5):.3f}')
learning_rate= 0.01
epoch = 20
for i in range(epoch):
y_pred = forward(X)
l = loss(Y, y_pred)
l.backward()
with torch.no_grad():
w = w - learning_rate * w.grad
# (w -= learning_rate * w.grad) # would not cause error in the following line
w.grad.zero_() #error : 'NoneType' object has no attribute 'zero_'
if i % 1 ==0:
print(f'weight : {w}, loss : {l}')
我真的很想知道“ w = w -leading_rate * w.grad ”和“ w- = learning_rate * w.grad ”,因为这两个在我的验证。谢谢!
I was following a series of tutorial on youtube about deep learning, and I encountered a problem which really confuses me.
X = torch.tensor([1,2,3,4], dtype = torch.float32)
Y = torch.tensor([2,4,6,8], dtype = torch.float32)
w = torch.tensor(0.0, dtype = torch.float32, requires_grad=True)
def forward(x):
return w*x;
def loss(y, y_predicted):
return ((y-y_predicted)**2).mean()
print(f'Prediction before training: f(5) = {forward(5):.3f}')
learning_rate= 0.01
epoch = 20
for i in range(epoch):
y_pred = forward(X)
l = loss(Y, y_pred)
l.backward()
with torch.no_grad():
w = w - learning_rate * w.grad
# (w -= learning_rate * w.grad) # would not cause error in the following line
w.grad.zero_() #error : 'NoneType' object has no attribute 'zero_'
if i % 1 ==0:
print(f'weight : {w}, loss : {l}')
I really wonder the difference between "w = w - learning_rate * w.grad" and "w -= learning_rate * w.grad" cause these two are the same in my expericence. Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
正如评论中指出的那样,问题在于Pytorch如何计算/存储梯度。实际上,
是一个就地操作,它将使w保持其初始属性(requientes_grad = true)。通常在pytorch中,我们避免了就地操作,因为它可能会破坏自动射击使用的计算图(请参见 pytorch论坛post )。
但是对您来说,这
不是就地。因此,W分配给新副本,并且由于Torch.no_grad()语句,此副本不会具有.grad属性。
As pointed out in the comment, the problem is in how Pytorch computes/stores gradients. In fact,
is an in-place operation, which will make w keep its initial properties (the requires_grad=True). Usually in Pytorch, we avoid in-place operations as it may break the computational graph used by Autograd (see Pytorch Forum Post).
But for you, this:
is not in-place. Thus, w is assigned to a new copy, and because of the torch.no_grad() statement, this copy won't have a .grad attribute.
尽管操作员正在工作 in
python
具有特定的行为,这是并不总是真实pytorch
框架。对于香草python
我们可以观看,我们可以观看 = b
等于a = a -b
语法。此外, python操作员可以使用 href =“ https://stackoverflow.com/a/37845498/2371987”>每次您要为变量分配值时。实际上,我们称此为 noreflow noreferrer“> https://docs.python.org/3/library/operator.html“ rel =“ nofollow noreferrer”>python
,并且可以用于简化您的代码。但是,对于在您的情况下,在
l.backward()
传播后分配了新的 w 。因此,它没有grad
值。此外,由于torch.no_grad()
条件。现在,让我们考虑到您的示例:
在框架,它等效于:
虽然
w [...] = w -learning_rate * w.grad
和w- = learning_rate * w.grad
似乎看起来等同于他们不遵循相同的结构。因此,它们促进了不同的行为。一方面,w- = Learning_rate * w.grad
表达式被认为是原位操作。这意味着它直接更改给定张量的内容( eg ,w [...]
),而无需复制。另一方面,需要= true
属性使 w 保持其初始属性。因此,不更新其状态。Although operators are working in
python
with a specific behavior, this is not always true for thepytorch
framework. For the vanillapython
interpretation, we could watch that the subtraction assignment ofA -= B
is equivalent forA = A - B
syntax. Additionally, python operators can be used every time you want to assign a value to a variable. In fact, we call this syntactic sugar inpython
, and it can be used to simplify your code. However, things work differently for thepytorch
framework.In your case, the new w is assigned after
l.backward()
being propagated. Hence, it has nograd
value. Moreover, the w assignment has norequires_grad
because of thetorch.no_grad()
condition.Now, let's take into account your example:
In the
pytorch
framework, it is equivalent to:Although
w[...] = w - learning_rate * w.grad
andw -= learning_rate * w.grad
might seem equivalent, they don't follow the same structure. Hence, they promote different behaviors. On the one hand, thew -= learning_rate * w.grad
expression is considered as an in-place operation. Meaning that it changes directly the content of a given tensor (e.g.,w[...]
) without making a copy. On the other hand, therequires_grad=True
attribute is making w keep its initial properties. Therefore, not updating its state.