无法保留 require_grad=False 的张量上的 grad,尽管专门将其设置为 true

发布于 2025-01-12 02:58:50 字数 2640 浏览 1 评论 0原文

我正在尝试在 PyTorch 中创建一个 nn.moduleset_model_params 目前非常混乱,但我正在尝试将 requires_grad 设置为 true,所以我可以使用 retain_grad(),但无论我将 requires_grad=True 放在哪里,它都会告诉我它是 False

class MyModule(nn.Module):
    def __init__(self, depth):
        super(MyModule, self).__init__()
        self.linear1 = nn.Linear(28*28, 50)
        self.linear3 = nn.Linear(50,10)
        

    def forward(self, x):
            x = x.view(-1, 28*28)
            x = torch.tanh(self.linear1(x))
            return self.linear3(x).view(-1)
    
    def set_model_params(self, params_dict):
        self.linear1.weight.data = params_dict["W1"]
        self.linear1.weight.data.requires_grad=True
        self.linear1.weight.data.retain_grad()

        self.linear1.bias.data = params_dict["b1"]
        self.linear1.bias.data.requires_grad=True
        self.linear1.bias.data.retain_grad()

        self.linear3.weight.data = params_dict["W3"]
        self.linear3.weight.data.requires_grad = True
        self.linear3.weight.data.retain_grad()

        self.linear3.bias.data = params_dict["b3"]
        self.linear3.bias.data.requires_grad = True
        self.linear3.bias.data.retain_grad()

这是放因为神经网络参数的最大值和最小值会在网络后期发生变化。我还在这里设置了 requires_grad=True

def init_params_dict(d):
    params_dict = {
        "W1": torch.rand(50, 28 * 28, device="cuda", requires_grad=True) * (d * 2) - 1,
        "b1": torch.zeros(50, device="cuda", requires_grad=True),
        "W3": torch.rand(10, 50, device="cuda", requires_grad=True) * (d * 2) - 1,
        "b3": torch.zeros(10, device="cuda", requires_grad=True),
    }
    for i in range(8):
        params_dict['Wd' + str(i)] = torch.rand(50, 50, device="cuda", requires_grad=True) * (d * 2) - 1
        params_dict['bd' + str(i)] = torch.zeros(50, device="cuda", requires_grad=True)
    
    return params_dict

mymod = MyModule(8)
public_params = init_params_dict(0.01)

mymod.set_model_params(public_params)

尽管如此,我还是收到了这个错误。

<ipython-input-88-52eb353b30c6> in set_model_params(self, params_dict)
     27         self.linear1.weight.data = params_dict["W1"]
     28         self.linear1.weight.data.requires_grad=True
---> 29         self.linear1.weight.data.retain_grad()
     30 
     31         self.linear1.bias.data = params_dict["b1"]

RuntimeError: can't retain_grad on Tensor that has requires_grad=False

如何在这些叶变量之一上设置 requires_grad=True 并使其保持不变?

I am attempting to create an nn.module in PyTorch, the set_model_params is currently very messy but I'm attempting to set requires_grad to true so I am able to use retain_grad(), but no matter where I'm putting requires_grad=True, it's telling me it's False:

class MyModule(nn.Module):
    def __init__(self, depth):
        super(MyModule, self).__init__()
        self.linear1 = nn.Linear(28*28, 50)
        self.linear3 = nn.Linear(50,10)
        

    def forward(self, x):
            x = x.view(-1, 28*28)
            x = torch.tanh(self.linear1(x))
            return self.linear3(x).view(-1)
    
    def set_model_params(self, params_dict):
        self.linear1.weight.data = params_dict["W1"]
        self.linear1.weight.data.requires_grad=True
        self.linear1.weight.data.retain_grad()

        self.linear1.bias.data = params_dict["b1"]
        self.linear1.bias.data.requires_grad=True
        self.linear1.bias.data.retain_grad()

        self.linear3.weight.data = params_dict["W3"]
        self.linear3.weight.data.requires_grad = True
        self.linear3.weight.data.retain_grad()

        self.linear3.bias.data = params_dict["b3"]
        self.linear3.bias.data.requires_grad = True
        self.linear3.bias.data.retain_grad()

This is set because the maximum and minimum values of the neural network parameters will change later in the network. I also have requires_grad=True set here.

def init_params_dict(d):
    params_dict = {
        "W1": torch.rand(50, 28 * 28, device="cuda", requires_grad=True) * (d * 2) - 1,
        "b1": torch.zeros(50, device="cuda", requires_grad=True),
        "W3": torch.rand(10, 50, device="cuda", requires_grad=True) * (d * 2) - 1,
        "b3": torch.zeros(10, device="cuda", requires_grad=True),
    }
    for i in range(8):
        params_dict['Wd' + str(i)] = torch.rand(50, 50, device="cuda", requires_grad=True) * (d * 2) - 1
        params_dict['bd' + str(i)] = torch.zeros(50, device="cuda", requires_grad=True)
    
    return params_dict

mymod = MyModule(8)
public_params = init_params_dict(0.01)

mymod.set_model_params(public_params)

Nonetheless, I get this error.

<ipython-input-88-52eb353b30c6> in set_model_params(self, params_dict)
     27         self.linear1.weight.data = params_dict["W1"]
     28         self.linear1.weight.data.requires_grad=True
---> 29         self.linear1.weight.data.retain_grad()
     30 
     31         self.linear1.bias.data = params_dict["b1"]

RuntimeError: can't retain_grad on Tensor that has requires_grad=False

How can I set requires_grad=True on one of these leaf variables and have it stick?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文