Optimizer.Step（）如何使模型最近丢失？

发布于 2025-02-04 04:47:02 字数 1238 浏览 5 评论 0原文

我正在查看一个模型的Pytorch的示例：

https://pytorch.org/tutorials/beginner/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-beginner-beifar-beginner-blitz-cifar10-tutorial-tutorial-porial-py

多次循环在数据集上

running_loss = 0.0
for i, data in enumerate(trainloader, 0):
    # get the inputs; data is a list of [inputs, labels]
    inputs, labels = data

    # zero the parameter gradients
    optimizer.zero_grad()

    # forward + backward + optimize
    outputs = net(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()

    # print statistics
    running_loss += loss.item()
    if i % 2000 == 1999:    # print every 2000 mini-batches
        print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
        running_loss = 0.0

print('Finished Training')

，我有一个非常基本的问题 - 优化器从未被插入或定义到模型中（类似于model.compile.compile in keras）。它也没有收到最后一批或时期的损失或标签。执行优化步骤如何“知道”？

原文

I am looking at the example from pytorch of a model:

https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py

for epoch in range(2): # loop over the dataset multiple times

running_loss = 0.0
for i, data in enumerate(trainloader, 0):
    # get the inputs; data is a list of [inputs, labels]
    inputs, labels = data

    # zero the parameter gradients
    optimizer.zero_grad()

    # forward + backward + optimize
    outputs = net(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()

    # print statistics
    running_loss += loss.item()
    if i % 2000 == 1999:    # print every 2000 mini-batches
        print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
        running_loss = 0.0

print('Finished Training')

And I have a very basic question - the optimizer was never inserted or defined into the model (similarly to model.compile in keras). Nor it received the loss or labels of the last batch or epoch.
How does it "knows" to perform optimization step?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

空宴 2025-02-11 04:47:02

在优化器实例化上，您传递模型参数：

optimizer = optim.Adam(model.parameters())

Optimizer.step更新这些参数。

在调用步骤方法之前，在lose.backward（）步骤上计算渐变。

On optimizer instantiation you pass a model parameters:

optimizer = optim.Adam(model.parameters())

optimizer.step updates those parameters.

gradients are computed on loss.backward() step before calling the step method.

回复收藏 0 原文

仅一夜美梦 2025-02-11 04:47:02

您应该将它们视为没有链接的单独事件，而不是考虑损失和参数如何相关。实际上，有两个不同的元素对参数及其缓存梯度有影响。

自摩rad机制（负责执行梯度计算的过程）允许您在torch.tensor上调用向后（您的损失）），然后将通过允许计算最终张量值的所有节点张量进行反流。这样做，它将浏览所谓的计算图，通过更改其grad属性来更新每个参数的梯度。这意味着，在向后的结尾，调用用于计算此输出的网络的学习参数将具有grad属性，其中包含相对于该损失的损失梯度参数。
```
  loss.backward（）
 
```
优化器独立于向后通行证，因为它不依赖它。您可以在图形上，多次或不同的损失项上向后拨打，具体取决于您的用例。优化器的任务是独立采用模型的参数（无论网络体系结构或其计算图如何），并使用给定优化例程更新它们（例如，通过随机梯度下降， root平方平方传播等...）。它浏览了所有参数，并使用其各自的梯度值对其进行了初始化并更新它们（应该将其存储在Grad属性中，至少通过一个反向传播。
```
  Optimizer.step（）
 
```

重要说明：

请记住尽管使用优化器的向后过程和实际的更新调用仅是由于优化器将使用由前面调用计算的结果而隐式链接的。
。
在pytorch参数梯度中保存在内存中，因此您必须在执行新的向后呼叫之前清除它们。这是使用优化器的功能。实际上，它清除了已注册为参数的张量的grad属性。

Rather than thinking about how loss and parameters are related, you should consider them as separate events which are not linked. Indeed, there are two distinct elements that have an effect on parameters and their cached gradient.

The autograd mechanism (the process in charge of performing gradient computation) allows you to call backward on a torch.Tensor (your loss) and which will in turn backpropagate through all the nodes tensors that are allowed to compute this final tensor value. Doing so, it will navigate through what's called the computation graph, updating each of the parameters' gradients by changing their grad attribute. This means that at the end of a backward call the network's learned parameters that were used to compute this output will have a grad attribute containing the gradient of the loss with respect to that parameter.
```
loss.backward()
```
The optimizer is independent of the backward pass since it doesn't rely on it. You can call backward on your graph once, multiple times, or on different loss terms depending on your use case. The optimizer's task is to take the parameters of the model independently (that is irrespective of the network architecture or its computation graph) and update them using a given optimization routine (for example via Stochastic Gradient Descent, Root Mean Squared Propagation, etc...). It goes through all parameters it was initialized with and updates them using their respective gradient value (which is supposed to be stored in the grad attribute by at least one backpropagation.
```
optimizer.step()
```

Important notes:

Keep in mind though that the backward process and the actual update call using the optimizer are linked implicitly only by the fact that the optimizer will use the results computed by the backward preceding call.
In PyTorch parameter gradients are kept in memory so you have to clear them out before performing a new backward call. This is done using the optimizer's zero_grad function. In practice, it clears the grad attribute of the tensors it has registered as parameters.

回复收藏 0 原文

~没有更多了~