PyTorch NN 未训练

发布于 2025-01-13 17:56:18 字数 1921 浏览 3 评论 0原文

我有一个可以运行的定制神经网络模型，并希望将其移至 PyTorch 框架。然而，由于某些配置错误，网络可能无法进行训练。如果您发现一些奇怪/错误的内容或可能是一个促成原因，请告知。

import torch
from torch import nn, optim
import torch.nn.functional as F
X_train_t = torch.tensor(X_train).float()
X_test_t = torch.tensor(X_test).float()
y_train_t = torch.tensor(y_train).long().reshape(y_train_t.shape[0], 1)
y_test_t = torch.tensor(y_test).long().reshape(y_test_t.shape[0], 1)

class Classifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(22, 10)
        self.fc2 = nn.Linear(10, 1)
        
    def forward(self, x):
        # make sure input tensor is flattened
        x = x.view(x.shape[0], -1)
        
        x = F.relu(self.fc1(x))
        x = F.log_softmax(self.fc2(x), dim=1)
        
        return x

model = Classifier()
criterion = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.003)

epochs = 2000
steps = 0

train_losses, test_losses = [], []
for e in range(epochs):
    # training loss
    optimizer.zero_grad()

    log_ps = model(X_train_t)
    loss = criterion(log_ps, y_train_t.type(torch.float32))
    loss.backward()
    optimizer.step()
    train_loss = loss.item()

    # test loss
    # Turn off gradients for validation, saves memory and computations
    with torch.no_grad():
        log_ps = model(X_test_t)
        test_loss = criterion(log_ps, y_test_t.to(torch.float32))
        ps = torch.exp(log_ps)

    train_losses.append(train_loss/len(X_train_t))
    test_losses.append(test_loss/len(X_test_t))
    
    if (e % 100 == 0):
        print("Epoch: {}/{}.. ".format(e, epochs),
          "Training Loss: {:.3f}.. ".format(train_loss/len(X_train_t)),
          "Test Loss: {:.3f}.. ".format(test_loss/len(X_test_t)))

训练没有发生：

Epoch: 0/2000..  Training Loss: 0.014..  Test Loss: 0.082.. 
Epoch: 100/2000..  Training Loss: 0.014..  Test Loss: 0.082.. 
...

原文

I have a bespoke NN model which works and wanted to move it to the PyTorch framework. However, the network is not training likely due to some misconfiguration. Please advise if you see something that is odd/wrong or could be a contributing reason.

import torch
from torch import nn, optim
import torch.nn.functional as F
X_train_t = torch.tensor(X_train).float()
X_test_t = torch.tensor(X_test).float()
y_train_t = torch.tensor(y_train).long().reshape(y_train_t.shape[0], 1)
y_test_t = torch.tensor(y_test).long().reshape(y_test_t.shape[0], 1)

class Classifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(22, 10)
        self.fc2 = nn.Linear(10, 1)
        
    def forward(self, x):
        # make sure input tensor is flattened
        x = x.view(x.shape[0], -1)
        
        x = F.relu(self.fc1(x))
        x = F.log_softmax(self.fc2(x), dim=1)
        
        return x

model = Classifier()
criterion = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.003)

epochs = 2000
steps = 0

train_losses, test_losses = [], []
for e in range(epochs):
    # training loss
    optimizer.zero_grad()

    log_ps = model(X_train_t)
    loss = criterion(log_ps, y_train_t.type(torch.float32))
    loss.backward()
    optimizer.step()
    train_loss = loss.item()

    # test loss
    # Turn off gradients for validation, saves memory and computations
    with torch.no_grad():
        log_ps = model(X_test_t)
        test_loss = criterion(log_ps, y_test_t.to(torch.float32))
        ps = torch.exp(log_ps)

    train_losses.append(train_loss/len(X_train_t))
    test_losses.append(test_loss/len(X_test_t))
    
    if (e % 100 == 0):
        print("Epoch: {}/{}.. ".format(e, epochs),
          "Training Loss: {:.3f}.. ".format(train_loss/len(X_train_t)),
          "Test Loss: {:.3f}.. ".format(test_loss/len(X_test_t)))

Training is not happening:

Epoch: 0/2000..  Training Loss: 0.014..  Test Loss: 0.082.. 
Epoch: 100/2000..  Training Loss: 0.014..  Test Loss: 0.082.. 
...

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

等待我真够勒 2025-01-20 17:56:18

问题的根源在于您对 self.fc2 的输出应用了 softmax 操作。 self.fc2 的输出大小为 1，因此无论输入如何，softmax 的输出都将为 1。在此处了解有关 pytorch 包中的 softmax 激活函数的更多信息。我怀疑您想使用 Sigmoid 函数将最后一个线性层的输出转换为区间 [0,1]，然后应用某种对数函数。

由于无论输入如何，softmax 都会导致输出为 1，因此模型训练效果不佳。我无权访问您的数据，因此无法准确模拟它，但根据我所掌握的信息，将 softmax 激活替换为 sigmoid 应该可以解决此问题。

更好、数值更稳定的方法是使用 BCEWITHLOGITSLOSS 代替 criterion = nn.BCELoss() 中的标准并删除最后的激活函数，因为该标准沿与 BCE 损失以获得更稳定的数值计算。

总而言之，我的建议是将 criterion = nn.BCELoss() 更改为 criterion = nn.BCEWithLogitsLoss() 并将 forawrd 函数更改如下：

def forward(self, x):
        # make sure input tensor is flattened
        x = x.view(x.shape[0], -1)
        
        x = F.relu(self.fc1(x))
        x = self.fc2(x)

The source of your problem is the fact that you apply the softmax operation on the output of self.fc2. The output of self.fc2 has a size of 1 and therfore the output of the softmax will be 1 regardless of the input. Read more on the softmax activation function in the pytorch package here. I suspect that you wanted to use the Sigmoid function to transform the output of the last linear layer to to interval [0,1] and then apply a log function of some sorts.

Because the softmax results in an output of 1 regardless of the input, the model did not train well. I do not have access to your data so i can not simulate it exactly but from the information I have, replacing the softmax activation with the sigmoid should solve this.

A better and more numerically stable approach will be to use the BCEWITHLOGITSLOSS instead of the criterion in criterion = nn.BCELoss() and remove the activation function at the end, since this criterion applies the sigmoid along with the BCE loss for a more stable numerical computation.

To summarize, my advice will be to change criterion = nn.BCELoss() to criterion = nn.BCEWithLogitsLoss() and change the forawrd function as follows:

def forward(self, x):
        # make sure input tensor is flattened
        x = x.view(x.shape[0], -1)
        
        x = F.relu(self.fc1(x))
        x = self.fc2(x)

回复收藏 0 原文

~没有更多了~