当前位置：文江博客话题详情

Pytorch Lightning（可训练的参数 - 错误）

发布于 2025-01-22 05:52:20 字数 1671 浏览 0 评论 0 原文

我正在使用Pytorch Lightning进行多GPU培训。下面的输出显示模型：

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
┏━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃    ┃ Name       ┃ Type              ┃ Params ┃
┡━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ 0  │ encoder    │ Encoder           │  2.0 M │
│ 1  │ classifier │ Sequential        │  8.8 K │
│ 2  │ criterion  │ BCEWithLogitsLoss │      0 │
│ 3  │ train_acc  │ Accuracy          │      0 │
│ 4  │ val_acc    │ Accuracy          │      0 │
│ 5  │ train_auc  │ AUROC             │      0 │
│ 6  │ val_auc    │ AUROC             │      0 │
│ 7  │ train_f1   │ F1Score           │      0 │
│ 8  │ val_f1     │ F1Score           │      0 │
│ 9  │ train_mcc  │ MatthewsCorrCoef  │      0 │
│ 10 │ val_mcc    │ MatthewsCorrCoef  │      0 │
│ 11 │ train_sens │ Recall            │      0 │
│ 12 │ val_sens   │ Recall            │      0 │
│ 13 │ train_spec │ Specificity       │      0 │
│ 14 │ val_spec   │ Specificity       │      0 │
└────┴────────────┴───────────────────┴────────┘
Trainable params: 2.0 M
Non-trainable params: 0

我将编码器设置为无法使用以下代码来实现：

ckpt = torch.load(chk_path)
self.encoder.load_state_dict(ckpt['state_dict'])
self.encoder.requires_grad = False

不应可训练的参数 be 8.8 K 而不是 2.0 m ？

我的优化器如下：

optimizer =  torch.optim.RMSprop(filter(lambda p: p.requires_grad, self.parameters()), lr =self.lr, weight_decay = self.weight_decay)

原文

I am employing MULTI-GPU training using pytorch lightning. The below output displays the model:

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
┏━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃    ┃ Name       ┃ Type              ┃ Params ┃
┡━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ 0  │ encoder    │ Encoder           │  2.0 M │
│ 1  │ classifier │ Sequential        │  8.8 K │
│ 2  │ criterion  │ BCEWithLogitsLoss │      0 │
│ 3  │ train_acc  │ Accuracy          │      0 │
│ 4  │ val_acc    │ Accuracy          │      0 │
│ 5  │ train_auc  │ AUROC             │      0 │
│ 6  │ val_auc    │ AUROC             │      0 │
│ 7  │ train_f1   │ F1Score           │      0 │
│ 8  │ val_f1     │ F1Score           │      0 │
│ 9  │ train_mcc  │ MatthewsCorrCoef  │      0 │
│ 10 │ val_mcc    │ MatthewsCorrCoef  │      0 │
│ 11 │ train_sens │ Recall            │      0 │
│ 12 │ val_sens   │ Recall            │      0 │
│ 13 │ train_spec │ Specificity       │      0 │
│ 14 │ val_spec   │ Specificity       │      0 │
└────┴────────────┴───────────────────┴────────┘
Trainable params: 2.0 M
Non-trainable params: 0

I have set Encoder to be untrainable using the below code:

ckpt = torch.load(chk_path)
self.encoder.load_state_dict(ckpt['state_dict'])
self.encoder.requires_grad = False

Shouldn't trainable params be 8.8 K rather than 2.0 M ?

My optimizer is the following:

optimizer =  torch.optim.RMSprop(filter(lambda p: p.requires_grad, self.parameters()), lr =self.lr, weight_decay = self.weight_decay)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

抹茶夏天i‖ 2025-01-29 05:52:20

self.encoder.requires_grad = false 什么都不做；实际上，火炬模块没有需要 flag。

第二个下划线），该方法将为本模块的所有参数设置需要

self.encoder.requires_grad_(False)

您应该做的是使用 oneques_grad _ 方法（请注意： _

self.encoder.requires_grad = False doesn't do anything; in fact, torch Modules don't have a requires_grad flag.

What you should do instead is use the requires_grad_ method (note the second underscore), that will set requires_grad for all the parameters of this module to the desired value:

self.encoder.requires_grad_(False)

as described here: https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.requires_grad_

回复收藏 0 原文

岁月流歌 2025-01-29 05:52:20

您需要设置 onegres_grad = false for Angoder参数一对一：

for param in self.encoder.parameters():
    param.requires_grad = False

You need to set requires_grad=False for all encoder parameters one-by-one:

for param in self.encoder.parameters():
    param.requires_grad = False

回复收藏 0 原文

贵在坚持 2025-01-29 05:52:20

请注意，如果您执行以下代码：

class MNISTModel(LightningModule):
    def __init__(self):
        super().__init__()
        self.l1 = torch.nn.Linear(28 * 28, 10)

    def forward(self, x):
        return torch.relu(self.l1(x.view(x.size(0), -1)))

    def training_step(self, batch, batch_nb):
        x, y = batch
        loss = F.cross_entropy(self(x), y)
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.02)

mnist_model = MNISTModel()
mnist_model.l2.requires_grad = False
print(mnist_model.l2.weight.requires_grad)
print(mnist_model.l2.bias.requires_grad)
ModelSummary(mnist_model)

您将获得：

True
True

  | Name | Type   | Params
--------------------------------
0 | l1   | Linear | 1.2 M 
1 | l2   | Linear | 2.5 M 
2 | l3   | Linear | 15.7 K
--------------------------------
3.7 M     Trainable params
0         Non-trainable params
3.7 M     Total params
14.827    Total estimated model params size (MB)

这实际上不是该层中参数的停用需要 requiens_grad 。因此，您有两个选择，请根据（

mnist_model = MNISTModel()
mnist_model.l2.requires_grad_(False)
ModelSummary(mnist_model)

  | Name | Type   | Params
--------------------------------
0 | l1   | Linear | 1.2 M 
1 | l2   | Linear | 2.5 M 
2 | l3   | Linear | 15.7 K
--------------------------------
1.2 M     Trainable params
2.5 M     Non-trainable params
3.7 M     Total params
14.827    Total estimated model params size (MB)

通过模块中的参数，

mnist_model = MNISTModel()
for param in mnist_model.l2.parameters():
    param.requires_grad = False

ModelSummary(mnist_model)

您需要看到：

  | Name | Type   | Params
--------------------------------
0 | l1   | Linear | 1.2 M 
1 | l2   | Linear | 2.5 M 
2 | l3   | Linear | 15.7 K
--------------------------------
1.2 M     Trainable params
2.5 M     Non-trainable params
3.7 M     Total params
14.827    Total estimated model params size (MB)

您需要设置 requales_grad <代码> false false for所需的特定层中的所有参数要停用

Notice that if you execute the following piece of code:

class MNISTModel(LightningModule):
    def __init__(self):
        super().__init__()
        self.l1 = torch.nn.Linear(28 * 28, 10)

    def forward(self, x):
        return torch.relu(self.l1(x.view(x.size(0), -1)))

    def training_step(self, batch, batch_nb):
        x, y = batch
        loss = F.cross_entropy(self(x), y)
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.02)

mnist_model = MNISTModel()
mnist_model.l2.requires_grad = False
print(mnist_model.l2.weight.requires_grad)
print(mnist_model.l2.bias.requires_grad)
ModelSummary(mnist_model)

You will get:

True
True

  | Name | Type   | Params
--------------------------------
0 | l1   | Linear | 1.2 M 
1 | l2   | Linear | 2.5 M 
2 | l3   | Linear | 15.7 K
--------------------------------
3.7 M     Trainable params
0         Non-trainable params
3.7 M     Total params
14.827    Total estimated model params size (MB)

which means that this is actually not deactivating requires_grad for the parameters in that layer. So, you have two option according to (https://pytorch.org/docs/stable/notes/autograd.html#setting-requires-grad)

Applying .requires_grad_() to a module as suggested by @burzam (the more correct one)

mnist_model = MNISTModel()
mnist_model.l2.requires_grad_(False)
ModelSummary(mnist_model)

  | Name | Type   | Params
--------------------------------
0 | l1   | Linear | 1.2 M 
1 | l2   | Linear | 2.5 M 
2 | l3   | Linear | 15.7 K
--------------------------------
1.2 M     Trainable params
2.5 M     Non-trainable params
3.7 M     Total params
14.827    Total estimated model params size (MB)

Loop through the parameters in the module

mnist_model = MNISTModel()
for param in mnist_model.l2.parameters():
    param.requires_grad = False

ModelSummary(mnist_model)

you will see:

  | Name | Type   | Params
--------------------------------
0 | l1   | Linear | 1.2 M 
1 | l2   | Linear | 2.5 M 
2 | l3   | Linear | 15.7 K
--------------------------------
1.2 M     Trainable params
2.5 M     Non-trainable params
3.7 M     Total params
14.827    Total estimated model params size (MB)

You need to to set requires_grad to False for all the parameters in the specific layers you want to deactivate