Pytorch中的梯度下降重新分配

发布于 2025-02-11 15:19:14 字数 915 浏览 2 评论 0原文

我正在YouTube上介绍了一系列有关深度学习的教程,我遇到了一个确实使我感到困惑的问题。

X = torch.tensor([1,2,3,4], dtype = torch.float32)
Y = torch.tensor([2,4,6,8], dtype = torch.float32)

w = torch.tensor(0.0, dtype = torch.float32, requires_grad=True)

def forward(x):
  return w*x;

def loss(y, y_predicted):
  return ((y-y_predicted)**2).mean()

print(f'Prediction before training: f(5) = {forward(5):.3f}')


learning_rate= 0.01
epoch = 20
for i in range(epoch):
  y_pred = forward(X)
  l = loss(Y, y_pred)
  l.backward()
  with torch.no_grad():
    w = w - learning_rate * w.grad
    # (w -= learning_rate * w.grad) # would not cause error in the following line


  w.grad.zero_() #error : 'NoneType' object has no attribute 'zero_'
  if i % 1 ==0:
    print(f'weight : {w}, loss : {l}')

我真的很想知道“ w = w -leading_rate * w.grad ”和“ w- = learning_rate * w.grad ”,因为这两个在我的验证。谢谢!

I was following a series of tutorial on youtube about deep learning, and I encountered a problem which really confuses me.

X = torch.tensor([1,2,3,4], dtype = torch.float32)
Y = torch.tensor([2,4,6,8], dtype = torch.float32)

w = torch.tensor(0.0, dtype = torch.float32, requires_grad=True)

def forward(x):
  return w*x;

def loss(y, y_predicted):
  return ((y-y_predicted)**2).mean()

print(f'Prediction before training: f(5) = {forward(5):.3f}')


learning_rate= 0.01
epoch = 20
for i in range(epoch):
  y_pred = forward(X)
  l = loss(Y, y_pred)
  l.backward()
  with torch.no_grad():
    w = w - learning_rate * w.grad
    # (w -= learning_rate * w.grad) # would not cause error in the following line


  w.grad.zero_() #error : 'NoneType' object has no attribute 'zero_'
  if i % 1 ==0:
    print(f'weight : {w}, loss : {l}')

I really wonder the difference between "w = w - learning_rate * w.grad" and "w -= learning_rate * w.grad" cause these two are the same in my expericence. Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

少女净妖师 2025-02-18 15:19:15

正如评论中指出的那样,问题在于Pytorch如何计算/存储梯度。实际上,

w-= learning_rate * w.grad

是一个就地操作,它将使w保持其初始属性(requientes_grad = true)。通常在pytorch中,我们避免了就地操作,因为它可能会破坏自动射击使用的计算图(请参见 pytorch论坛post )。

但是对您来说,这

w = w - learning_rate * w.grad

不是就地。因此,W分配给新副本,并且由于Torch.no_grad()语句,此副本不会具有.grad属性。

As pointed out in the comment, the problem is in how Pytorch computes/stores gradients. In fact,

w-= learning_rate * w.grad

is an in-place operation, which will make w keep its initial properties (the requires_grad=True). Usually in Pytorch, we avoid in-place operations as it may break the computational graph used by Autograd (see Pytorch Forum Post).

But for you, this:

w = w - learning_rate * w.grad

is not in-place. Thus, w is assigned to a new copy, and because of the torch.no_grad() statement, this copy won't have a .grad attribute.

孤云独去闲 2025-02-18 15:19:15

尽管操作员正在工作 in python 具有特定的行为,这是并不总是真实 pytorch 框架。对于香草 python我们可以观看,我们可以观看 = b等于a = a -b语法。此外, python操作员可以使用 href =“ https://stackoverflow.com/a/37845498/2371987”>每次您要为变量分配值时。实际上,我们称此为 noreflow noreferrer“> https://docs.python.org/3/library/operator.html“ rel =“ nofollow noreferrer”> python ,并且可以用于简化您的代码。但是,对于

在您的情况下,在l.backward()传播后分配了新的 w 。因此,它没有grad值。此外,由于torch.no_grad()条件。

现在,让我们考虑到您的示例:

w = w - learning_rate * w.grad

框架,它等效于:

w[...] = w - learning_rate * w.grad

虽然w [...] = w -learning_rate * w.gradw- = learning_rate * w.grad似乎看起来等同于他们不遵循相同的结构。因此,它们促进了不同的行为。一方面,w- = Learning_rate * w.grad表达式被认为是原位操作。这意味着它直接更改给定张量的内容( eg w [...]),而无需复制。另一方面,需要= true属性使 w 保持其初始属性。因此,不更新其状态。

Although operators are working in python with a specific behavior, this is not always true for the pytorch framework. For the vanilla python interpretation, we could watch that the subtraction assignment of A -= B is equivalent for A = A - B syntax. Additionally, python operators can be used every time you want to assign a value to a variable. In fact, we call this syntactic sugar in python, and it can be used to simplify your code. However, things work differently for the pytorch framework.

In your case, the new w is assigned after l.backward() being propagated. Hence, it has no grad value. Moreover, the w assignment has no requires_grad because of the torch.no_grad() condition.

Now, let's take into account your example:

w = w - learning_rate * w.grad

In the pytorch framework, it is equivalent to:

w[...] = w - learning_rate * w.grad

Although w[...] = w - learning_rate * w.grad and w -= learning_rate * w.grad might seem equivalent, they don't follow the same structure. Hence, they promote different behaviors. On the one hand, the w -= learning_rate * w.grad expression is considered as an in-place operation. Meaning that it changes directly the content of a given tensor (e.g., w[...]) without making a copy. On the other hand, the requires_grad=True attribute is making w keep its initial properties. Therefore, not updating its state.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文