如何更新火炬的一部分。参数

发布于 2025-02-12 17:03:25 字数 1796 浏览 1 评论 0原文

用于更新TORCH.NN.参数定义的一部分参数。我已经测试了以下三种方法,但只有一种作品。

#(1)

import torch
class NET(torch.nn.Module):
    def __init__(self):
        super(NET, self).__init__()
        self.params = torch.ones(4)
        self.P = torch.nn.Parameter(torch.ones(1))
        self.params[1] = self.P
    def forward(self, x):
        y = x * self.params
        return y.sum()

net = NET()
x = torch.rand(4)
optim = torch.optim.Adam(net.parameters(), lr=0.001)
for _ in range(10):
    optim.zero_grad()
    loss = net(x)
    loss.backward()
    optim.step()

#runtimeerror:尝试第二次通过图向后

##(2)

import torch
class NET(torch.nn.Module):
    def __init__(self):
        super(NET, self).__init__()
        self.P = torch.nn.Parameter(torch.ones(1))
    def forward(self, x):
        params = torch.ones(4)
        params[1] = self.P
        y = x * params
        return y.sum()

net = NET()
x = torch.rand(4)
optim = torch.optim.Adam(net.parameters(), lr=0.001)
for _ in range(10):
    optim.zero_grad()
    loss = net(x)
    loss.backward()
    optim.step()

##它起作用,但是在每个正向中都需要创建和分配的操作。

#(3)

import torch
class NET(torch.nn.Module):
    def __init__(self):
        super(NET, self).__init__()
        self.params = torch.nn.Parameter(torch.ones(4))
    def forward(self, x):
        y = x * self.params
        return y.sum()

net = NET()
net.params[1].requires_grad = False
x = torch.rand(4)
optim = torch.optim.Adam(net.parameters(), lr=0.001)
for _ in range(10):
    optim.zero_grad()
    loss = net(x)
    loss.backward()
    optim.step()

#runtimeerror:您只能更改叶子变量的sunirtes_grad标志。

我想知道如何更新一部分以(1)和(3)方式的参数。

For updating a part of parameters defined by torch.nn.Parameter. I have tested the following three ways, but only one works.

#(1)

import torch
class NET(torch.nn.Module):
    def __init__(self):
        super(NET, self).__init__()
        self.params = torch.ones(4)
        self.P = torch.nn.Parameter(torch.ones(1))
        self.params[1] = self.P
    def forward(self, x):
        y = x * self.params
        return y.sum()

net = NET()
x = torch.rand(4)
optim = torch.optim.Adam(net.parameters(), lr=0.001)
for _ in range(10):
    optim.zero_grad()
    loss = net(x)
    loss.backward()
    optim.step()

# RuntimeError: Trying to backward through the graph a second time

#(2)

import torch
class NET(torch.nn.Module):
    def __init__(self):
        super(NET, self).__init__()
        self.P = torch.nn.Parameter(torch.ones(1))
    def forward(self, x):
        params = torch.ones(4)
        params[1] = self.P
        y = x * params
        return y.sum()

net = NET()
x = torch.rand(4)
optim = torch.optim.Adam(net.parameters(), lr=0.001)
for _ in range(10):
    optim.zero_grad()
    loss = net(x)
    loss.backward()
    optim.step()

# It works, but the operations of Create and Assign are needed in each forward.

#(3)

import torch
class NET(torch.nn.Module):
    def __init__(self):
        super(NET, self).__init__()
        self.params = torch.nn.Parameter(torch.ones(4))
    def forward(self, x):
        y = x * self.params
        return y.sum()

net = NET()
net.params[1].requires_grad = False
x = torch.rand(4)
optim = torch.optim.Adam(net.parameters(), lr=0.001)
for _ in range(10):
    optim.zero_grad()
    loss = net(x)
    loss.backward()
    optim.step()

# RuntimeError: you can only change requires_grad flags of leaf variables.

I wonder how to update a part of parameters in the ways (1) and (3).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

您的好友蓝忘机已上羡 2025-02-19 17:03:25

关于使用需要 nn.参数

  1. 如果您必须冻结您的子模块 nn.module,您需要您需要使用 requientes_grad _grad _grad _grad _ 。但是,您不能在张量上部分梯度。


  2. a nn.parameter是一个包装器,允许给定的torch.tensor be 注册 nn.module。默认情况下,包装张量将需要梯度计算。

因此,您必须绝对将参数张量定义为:

nn.Parameter(torch.ones(4))

而不是:

self.params = torch.ones(4)

最终您应该使用 nn.module#参数 在将它们加载到优化器中之前。

您的第一个代码#1崩溃,因为您在同一棵树上执行多个反向流动,而无需明确设置 retain_graph to true。以下过程正常工作:

for _ in range(10):
    optim.zero_grad()
    x = torch.rand(4) # new x
    loss = net(x)
    loss.backward()
    optim.step()

您的第二个代码#2是正确的,因为您分配了需要梯度到其他张量的张量。最小的实现方法可以检查梯度是否确实是在p上计算的,如下所示:

# reassign parameter tensor to bigger tensor
>>> P = torch.ones(1, requires_grad=True)
>>> params = torch.ones(4)
>>> params[1] = P

# inference and backpropagation
>>> (torch.rand(4)*params).sum().backward()
>>> P.grad
tensor([0.46701658])

您的第三代码#3无效,因为您需要在代码的一部分中计算梯度计算是不可能的:

net.params[1].requires_grad = False # invalid

相反,另一种方法是在对参数上进行后退传播后掩盖梯度:

class NET(nn.Module):
    def __init__(self):
        super().__init__()
        self.params = nn.Parameter(torch.ones(4))

    def forward(self, x):
        y = x * self.params
        return y.sum()


net = NET()
net(torch.rand(4)).backward()
net.params.grad[1] = 0

A small note on the use of requires_grad and nn.Parameter:

  1. If you had to freeze a sub-module of you nn.Module, you would require the use of requires_grad_. However, you cannot partially require gradients on a tensor.

  2. A nn.Parameter is a wrapper which allows a given torch.Tensor to be registered inside a nn.Module. By default, the wrapped tensor will require gradient computation.

You must therefore absolutely have your parameter tensor defined as:

nn.Parameter(torch.ones(4))

And not as:

self.params = torch.ones(4)

Ultimately you should check the content of your registered parameters with nn.Module#parameters before loading them inside an optimizer.

Your first code #1 crashes because you are performing multiple backpropagations on the same tree without explicitly setting the retain_graph to True. The following process works fine:

for _ in range(10):
    optim.zero_grad()
    x = torch.rand(4) # new x
    loss = net(x)
    loss.backward()
    optim.step()

Your second code #2 is correct because you are assigning the tensor which requires gradient to a different tensor. A minimal implementation to check that the gradient is indeed computed on P is as follows:

# reassign parameter tensor to bigger tensor
>>> P = torch.ones(1, requires_grad=True)
>>> params = torch.ones(4)
>>> params[1] = P

# inference and backpropagation
>>> (torch.rand(4)*params).sum().backward()
>>> P.grad
tensor([0.46701658])

Your third code #3 is invalid because you are requiring gradient computation on part of the code which is not possible:

net.params[1].requires_grad = False # invalid

An alternative way to do it instead is by masking the gradient after the back propagation has been done on the parameters:

class NET(nn.Module):
    def __init__(self):
        super().__init__()
        self.params = nn.Parameter(torch.ones(4))

    def forward(self, x):
        y = x * self.params
        return y.sum()


net = NET()
net(torch.rand(4)).backward()
net.params.grad[1] = 0
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文