如何更新火炬的一部分。参数

发布于 2025-02-12 17:03:25 字数 1796 浏览 1 评论 0原文

用于更新TORCH.NN.参数定义的一部分参数。我已经测试了以下三种方法，但只有一种作品。

＃（1）

import torch
class NET(torch.nn.Module):
    def __init__(self):
        super(NET, self).__init__()
        self.params = torch.ones(4)
        self.P = torch.nn.Parameter(torch.ones(1))
        self.params[1] = self.P
    def forward(self, x):
        y = x * self.params
        return y.sum()

net = NET()
x = torch.rand(4)
optim = torch.optim.Adam(net.parameters(), lr=0.001)
for _ in range(10):
    optim.zero_grad()
    loss = net(x)
    loss.backward()
    optim.step()

＃runtimeerror：尝试第二次通过图向后

＃＃（2）

import torch
class NET(torch.nn.Module):
    def __init__(self):
        super(NET, self).__init__()
        self.P = torch.nn.Parameter(torch.ones(1))
    def forward(self, x):
        params = torch.ones(4)
        params[1] = self.P
        y = x * params
        return y.sum()

net = NET()
x = torch.rand(4)
optim = torch.optim.Adam(net.parameters(), lr=0.001)
for _ in range(10):
    optim.zero_grad()
    loss = net(x)
    loss.backward()
    optim.step()

＃＃它起作用，但是在每个正向中都需要创建和分配的操作。

＃（3）

import torch
class NET(torch.nn.Module):
    def __init__(self):
        super(NET, self).__init__()
        self.params = torch.nn.Parameter(torch.ones(4))
    def forward(self, x):
        y = x * self.params
        return y.sum()

net = NET()
net.params[1].requires_grad = False
x = torch.rand(4)
optim = torch.optim.Adam(net.parameters(), lr=0.001)
for _ in range(10):
    optim.zero_grad()
    loss = net(x)
    loss.backward()
    optim.step()

＃runtimeerror：您只能更改叶子变量的sunirtes_grad标志。

我想知道如何更新一部分以（1）和（3）方式的参数。

原文

For updating a part of parameters defined by torch.nn.Parameter. I have tested the following three ways, but only one works.

#(1)

import torch
class NET(torch.nn.Module):
    def __init__(self):
        super(NET, self).__init__()
        self.params = torch.ones(4)
        self.P = torch.nn.Parameter(torch.ones(1))
        self.params[1] = self.P
    def forward(self, x):
        y = x * self.params
        return y.sum()

net = NET()
x = torch.rand(4)
optim = torch.optim.Adam(net.parameters(), lr=0.001)
for _ in range(10):
    optim.zero_grad()
    loss = net(x)
    loss.backward()
    optim.step()

# RuntimeError: Trying to backward through the graph a second time

#(2)

import torch
class NET(torch.nn.Module):
    def __init__(self):
        super(NET, self).__init__()
        self.P = torch.nn.Parameter(torch.ones(1))
    def forward(self, x):
        params = torch.ones(4)
        params[1] = self.P
        y = x * params
        return y.sum()

net = NET()
x = torch.rand(4)
optim = torch.optim.Adam(net.parameters(), lr=0.001)
for _ in range(10):
    optim.zero_grad()
    loss = net(x)
    loss.backward()
    optim.step()

# It works, but the operations of Create and Assign are needed in each forward.

#(3)

import torch
class NET(torch.nn.Module):
    def __init__(self):
        super(NET, self).__init__()
        self.params = torch.nn.Parameter(torch.ones(4))
    def forward(self, x):
        y = x * self.params
        return y.sum()

net = NET()
net.params[1].requires_grad = False
x = torch.rand(4)
optim = torch.optim.Adam(net.parameters(), lr=0.001)
for _ in range(10):
    optim.zero_grad()
    loss = net(x)
    loss.backward()
    optim.step()

# RuntimeError: you can only change requires_grad flags of leaf variables.

I wonder how to update a part of parameters in the ways (1) and (3).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

您的好友蓝忘机已上羡 2025-02-19 17:03:25

关于使用需要和 nn.参数：

如果您必须冻结您的子模块 nn.module，您需要您需要使用 requientes_grad _grad _grad _grad _ 。但是，您不能在张量上部分梯度。
a nn.parameter是一个包装器，允许给定的torch.tensor be 注册 nn.module。默认情况下，包装张量将需要梯度计算。

因此，您必须绝对将参数张量定义为：

nn.Parameter(torch.ones(4))

而不是：

self.params = torch.ones(4)

最终您应该使用 nn.module＃参数在将它们加载到优化器中之前。

您的第一个代码＃1崩溃，因为您在同一棵树上执行多个反向流动，而无需明确设置 retain_graph to true。以下过程正常工作：

for _ in range(10):
    optim.zero_grad()
    x = torch.rand(4) # new x
    loss = net(x)
    loss.backward()
    optim.step()

您的第二个代码＃2是正确的，因为您分配了需要梯度到其他张量的张量。最小的实现方法可以检查梯度是否确实是在p上计算的，如下所示：

# reassign parameter tensor to bigger tensor
>>> P = torch.ones(1, requires_grad=True)
>>> params = torch.ones(4)
>>> params[1] = P

# inference and backpropagation
>>> (torch.rand(4)*params).sum().backward()
>>> P.grad
tensor([0.46701658])

您的第三代码＃3无效，因为您需要在代码的一部分中计算梯度计算是不可能的：

net.params[1].requires_grad = False # invalid

相反，另一种方法是在对参数上进行后退传播后掩盖梯度：

class NET(nn.Module):
    def __init__(self):
        super().__init__()
        self.params = nn.Parameter(torch.ones(4))

    def forward(self, x):
        y = x * self.params
        return y.sum()


net = NET()
net(torch.rand(4)).backward()
net.params.grad[1] = 0

A small note on the use of requires_grad and nn.Parameter:

If you had to freeze a sub-module of you nn.Module, you would require the use of requires_grad_. However, you cannot partially require gradients on a tensor.
A nn.Parameter is a wrapper which allows a given torch.Tensor to be registered inside a nn.Module. By default, the wrapped tensor will require gradient computation.

You must therefore absolutely have your parameter tensor defined as:

nn.Parameter(torch.ones(4))

And not as:

self.params = torch.ones(4)

Ultimately you should check the content of your registered parameters with nn.Module#parameters before loading them inside an optimizer.

Your first code #1 crashes because you are performing multiple backpropagations on the same tree without explicitly setting the retain_graph to True. The following process works fine:

for _ in range(10):
    optim.zero_grad()
    x = torch.rand(4) # new x
    loss = net(x)
    loss.backward()
    optim.step()

Your second code #2 is correct because you are assigning the tensor which requires gradient to a different tensor. A minimal implementation to check that the gradient is indeed computed on P is as follows:

# reassign parameter tensor to bigger tensor
>>> P = torch.ones(1, requires_grad=True)
>>> params = torch.ones(4)
>>> params[1] = P

# inference and backpropagation
>>> (torch.rand(4)*params).sum().backward()
>>> P.grad
tensor([0.46701658])

Your third code #3 is invalid because you are requiring gradient computation on part of the code which is not possible:

net.params[1].requires_grad = False # invalid

An alternative way to do it instead is by masking the gradient after the back propagation has been done on the parameters:

class NET(nn.Module):
    def __init__(self):
        super().__init__()
        self.params = nn.Parameter(torch.ones(4))

    def forward(self, x):
        y = x * self.params
        return y.sum()


net = NET()
net(torch.rand(4)).backward()
net.params.grad[1] = 0

回复收藏 0 原文

~没有更多了~