Optimizer.Step()如何使模型最近丢失?
我正在查看一个模型的Pytorch的示例:
多次循环在数据集上
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
running_loss = 0.0
print('Finished Training')
,我有一个非常基本的问题 - 优化器从未被插入或定义到模型中(类似于model.compile.compile
in keras
)。它也没有收到最后一批或时期的损失或标签。 执行优化步骤如何“知道”?
I am looking at the example from pytorch of a model:
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
running_loss = 0.0
print('Finished Training')
And I have a very basic question - the optimizer was never inserted or defined into the model (similarly to model.compile
in keras
). Nor it received the loss or labels of the last batch or epoch.
How does it "knows" to perform optimization step?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在优化器实例化上,您传递模型参数:
Optimizer.step
更新这些参数。在调用
步骤
方法之前,在lose.backward()
步骤上计算渐变。On optimizer instantiation you pass a model parameters:
optimizer.step
updates those parameters.gradients are computed on
loss.backward()
step before calling thestep
method.您应该将它们视为没有链接的单独事件,而不是考虑损失和参数如何相关。实际上,有两个不同的元素对参数及其缓存梯度有影响。
自摩rad机制(负责执行梯度计算的过程)允许您在
torch.tensor
上调用向后
(您的损失) ),然后将通过允许计算最终张量值的所有节点张量进行反流。这样做,它将浏览所谓的计算图,通过更改其grad
属性来更新每个参数的梯度。这意味着,在向后
的结尾,调用用于计算此输出的网络的学习参数将具有grad
属性,其中包含相对于该损失的损失梯度参数。优化器独立于向后通行证,因为它不依赖它。您可以在图形上,多次或不同的损失项上向后拨打,具体取决于您的用例。优化器的任务是独立采用模型的参数(无论网络体系结构或其计算图如何),并使用给定优化例程更新它们(例如,通过随机梯度下降, root平方平方传播等...)。它浏览了所有参数,并使用其各自的梯度值对其进行了初始化并更新它们(应该将其存储在
Grad
属性中,至少通过一个反向传播。重要说明:
请记住尽管使用优化器的向后过程和实际的更新调用仅是由于优化器将使用由前面调用计算的结果而隐式链接的。
。
在pytorch参数梯度中保存在内存中,因此您必须在执行新的向后呼叫之前清除它们。这是使用优化器的 功能。实际上,它清除了已注册为参数的张量的
grad
属性。Rather than thinking about how loss and parameters are related, you should consider them as separate events which are not linked. Indeed, there are two distinct elements that have an effect on parameters and their cached gradient.
The autograd mechanism (the process in charge of performing gradient computation) allows you to call
backward
on atorch.Tensor
(your loss) and which will in turn backpropagate through all the nodes tensors that are allowed to compute this final tensor value. Doing so, it will navigate through what's called the computation graph, updating each of the parameters' gradients by changing theirgrad
attribute. This means that at the end of abackward
call the network's learned parameters that were used to compute this output will have agrad
attribute containing the gradient of the loss with respect to that parameter.The optimizer is independent of the backward pass since it doesn't rely on it. You can call backward on your graph once, multiple times, or on different loss terms depending on your use case. The optimizer's task is to take the parameters of the model independently (that is irrespective of the network architecture or its computation graph) and update them using a given optimization routine (for example via Stochastic Gradient Descent, Root Mean Squared Propagation, etc...). It goes through all parameters it was initialized with and updates them using their respective gradient value (which is supposed to be stored in the
grad
attribute by at least one backpropagation.Important notes:
Keep in mind though that the backward process and the actual update call using the optimizer are linked implicitly only by the fact that the optimizer will use the results computed by the backward preceding call.
In PyTorch parameter gradients are kept in memory so you have to clear them out before performing a new backward call. This is done using the optimizer's
zero_grad
function. In practice, it clears thegrad
attribute of the tensors it has registered as parameters.