Pytorch中的梯度下降重新分配

发布于 2025-02-11 15:19:14 字数 915 浏览 2 评论 0原文

我正在YouTube上介绍了一系列有关深度学习的教程，我遇到了一个确实使我感到困惑的问题。

X = torch.tensor([1,2,3,4], dtype = torch.float32)
Y = torch.tensor([2,4,6,8], dtype = torch.float32)

w = torch.tensor(0.0, dtype = torch.float32, requires_grad=True)

def forward(x):
  return w*x;

def loss(y, y_predicted):
  return ((y-y_predicted)**2).mean()

print(f'Prediction before training: f(5) = {forward(5):.3f}')


learning_rate= 0.01
epoch = 20
for i in range(epoch):
  y_pred = forward(X)
  l = loss(Y, y_pred)
  l.backward()
  with torch.no_grad():
    w = w - learning_rate * w.grad
    # (w -= learning_rate * w.grad) # would not cause error in the following line


  w.grad.zero_() #error : 'NoneType' object has no attribute 'zero_'
  if i % 1 ==0:
    print(f'weight : {w}, loss : {l}')

我真的很想知道“ w = w -leading_rate * w.grad ”和“ w- = learning_rate * w.grad ”，因为这两个在我的验证。谢谢！

原文

I was following a series of tutorial on youtube about deep learning, and I encountered a problem which really confuses me.

X = torch.tensor([1,2,3,4], dtype = torch.float32)
Y = torch.tensor([2,4,6,8], dtype = torch.float32)

w = torch.tensor(0.0, dtype = torch.float32, requires_grad=True)

def forward(x):
  return w*x;

def loss(y, y_predicted):
  return ((y-y_predicted)**2).mean()

print(f'Prediction before training: f(5) = {forward(5):.3f}')


learning_rate= 0.01
epoch = 20
for i in range(epoch):
  y_pred = forward(X)
  l = loss(Y, y_pred)
  l.backward()
  with torch.no_grad():
    w = w - learning_rate * w.grad
    # (w -= learning_rate * w.grad) # would not cause error in the following line


  w.grad.zero_() #error : 'NoneType' object has no attribute 'zero_'
  if i % 1 ==0:
    print(f'weight : {w}, loss : {l}')

I really wonder the difference between "w = w - learning_rate * w.grad" and "w -= learning_rate * w.grad" cause these two are the same in my expericence. Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

少女净妖师 2025-02-18 15:19:15

正如评论中指出的那样，问题在于Pytorch如何计算/存储梯度。实际上，

w-= learning_rate * w.grad

是一个就地操作，它将使w保持其初始属性（requientes_grad = true）。通常在pytorch中，我们避免了就地操作，因为它可能会破坏自动射击使用的计算图（请参见 pytorch论坛post ）。

但是对您来说，这

w = w - learning_rate * w.grad

不是就地。因此，W分配给新副本，并且由于Torch.no_grad（）语句，此副本不会具有.grad属性。

As pointed out in the comment, the problem is in how Pytorch computes/stores gradients. In fact,

w-= learning_rate * w.grad

is an in-place operation, which will make w keep its initial properties (the requires_grad=True). Usually in Pytorch, we avoid in-place operations as it may break the computational graph used by Autograd (see Pytorch Forum Post).

But for you, this:

w = w - learning_rate * w.grad

is not in-place. Thus, w is assigned to a new copy, and because of the torch.no_grad() statement, this copy won't have a .grad attribute.

回复收藏 0 原文

孤云独去闲 2025-02-18 15:19:15

尽管操作员正在工作 in python 具有特定的行为，这是并不总是真实 pytorch 框架。对于香草 python我们可以观看，我们可以观看 = b等于a = a -b语法。此外， python操作员可以使用 href =“ https://stackoverflow.com/a/37845498/2371987”>每次您要为变量分配值时。实际上，我们称此为 noreflow noreferrer“> https://docs.python.org/3/library/operator.html“ rel =“ nofollow noreferrer”> python ，并且可以用于简化您的代码。但是，对于

在您的情况下，在l.backward（）传播后分配了新的 w 。因此，它没有grad值。此外，由于torch.no_grad（）条件。

现在，让我们考虑到您的示例：

w = w - learning_rate * w.grad

在框架，它等效于：

w[...] = w - learning_rate * w.grad

虽然w [...] = w -learning_rate * w.grad和w- = learning_rate * w.grad似乎看起来等同于他们不遵循相同的结构。因此，它们促进了不同的行为。一方面，w- = Learning_rate * w.grad表达式被认为是原位操作。这意味着它直接更改给定张量的内容（ eg ，w [...]），而无需复制。另一方面，需要= true属性使 w 保持其初始属性。因此，不更新其状态。

Although operators are working in python with a specific behavior, this is not always true for the pytorch framework. For the vanilla python interpretation, we could watch that the subtraction assignment of A -= B is equivalent for A = A - B syntax. Additionally, python operators can be used every time you want to assign a value to a variable. In fact, we call this syntactic sugar in python, and it can be used to simplify your code. However, things work differently for the pytorch framework.

In your case, the new w is assigned after l.backward() being propagated. Hence, it has no grad value. Moreover, the w assignment has no requires_grad because of the torch.no_grad() condition.

Now, let's take into account your example:

w = w - learning_rate * w.grad

In the pytorch framework, it is equivalent to:

w[...] = w - learning_rate * w.grad

Although w[...] = w - learning_rate * w.grad and w -= learning_rate * w.grad might seem equivalent, they don't follow the same structure. Hence, they promote different behaviors. On the one hand, the w -= learning_rate * w.grad expression is considered as an in-place operation. Meaning that it changes directly the content of a given tensor (e.g., w[...]) without making a copy. On the other hand, the requires_grad=True attribute is making w keep its initial properties. Therefore, not updating its state.