CNN 的训练损失在增加?

发布于 2025-01-11 00:40:29 字数 2372 浏览 3 评论 0原文

我正在训练我的第一个 CNN 来解决多类分类问题。我正在输入与 182 个类别之一相对应的动物图像,但是我遇到了一些问题。首先,我的代码似乎卡在 optimiser.step() 上,它已经计算了大约 30 分钟。其次,我的训练损失正在增加:

EPOCH: 0 BATCH: 1999 LOSS: 1.5790680234357715
EPOCH: 0 BATCH: 3999 LOSS: 2.9340945997834207

如果有人能够提供一些指导,我将不胜感激。下面是我的代码

#loading data
train_data = dataset.get_subset(
    "train",
    transform=transforms.Compose(
        [transforms.Resize((448, 448)), transforms.ToTensor()]
    ),
)

train_loader = get_train_loader("standard", train_data, batch_size=16)
#definind model
class ConvNet(nn.Module):

  def __init__(self):
    super(ConvNet, self).__init__()
    self.conv1 = nn.Conv2d(3, 6, 3, 1)
    self.conv2 = nn.Conv2d(6, 16, 3, 3)
    self.fc1 = nn.Linear(37*37*16, 120)
    self.fc2 = nn.Linear(120, 84)
    self.fc3 = nn.Linear(84, 182)

  def forward(self, X):
    X = F.relu(self.conv1(X))
    X = F.max_pool2d(X, 2, 2)
    X = F.relu(self.conv2(X))
    X = F.max_pool2d(X, 2, 2)
    X = torch.flatten(X, 1)
    X = F.relu(self.fc1((X)))
    X = F.relu(self.fc2((X)))
    X = self.fc3(X)
    return F.log_softmax(X, dim=1)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(modell.parameters(), lr=0.001)

import time

start_time = time.time()

#VARIABLES  (TRACKER)
epochs = 2
train_losses = []
test_losses = []
train_correct = []
test_correct = []

# FOR LOOP EPOCH
for i in range(epochs):
  trn_corr = 0
  tst_corr = 0

  running_loss = 0.0
  #TRAIN
  for b, (X_train, Y_train, meta) in enumerate(train_loader):
    
    b+=1 #batch starts at 1

    #zero parameter gradients
    optimizer.zero_grad()

    # pass training to model as float (later compute loss)
    output = modell(X_train.float())

    #Calculate the loss of outputs with respect to ground truth values
    loss = criterion(output, Y_train)

    #Backpropagate the loss through the network
    loss.backward()

    #perform parameter update based on the current gradient
    optimizer.step()

    predicted = torch.max(output.data, 1)[1]


    batch_corr = (predicted == Y_train).sum() # True (1) or False (0)
    trn_corr += batch_corr

    running_loss += loss.item()

    if b%2000 == 1999:
      print(f"EPOCH: {i} BATCH: {b} LOSS: {running_loss/2000}")
      running_loss = 0.0

train_losses.append(loss)
train_correct.append(trn_corr)

I am in the process of training my first CNN to solve a multi-class classification problem. I am feeding in images of animals corresponding to one of 182 classes, however I have ran into some issues. Firstly my code appears to get stuck on optimiser.step(), it has been calculating this for roughly 30 minutes. Secondly my training loss is increasing:

EPOCH: 0 BATCH: 1999 LOSS: 1.5790680234357715
EPOCH: 0 BATCH: 3999 LOSS: 2.9340945997834207

If any one would be able to provide some guidance that would be greatly appreciated. Below is my code

#loading data
train_data = dataset.get_subset(
    "train",
    transform=transforms.Compose(
        [transforms.Resize((448, 448)), transforms.ToTensor()]
    ),
)

train_loader = get_train_loader("standard", train_data, batch_size=16)
#definind model
class ConvNet(nn.Module):

  def __init__(self):
    super(ConvNet, self).__init__()
    self.conv1 = nn.Conv2d(3, 6, 3, 1)
    self.conv2 = nn.Conv2d(6, 16, 3, 3)
    self.fc1 = nn.Linear(37*37*16, 120)
    self.fc2 = nn.Linear(120, 84)
    self.fc3 = nn.Linear(84, 182)

  def forward(self, X):
    X = F.relu(self.conv1(X))
    X = F.max_pool2d(X, 2, 2)
    X = F.relu(self.conv2(X))
    X = F.max_pool2d(X, 2, 2)
    X = torch.flatten(X, 1)
    X = F.relu(self.fc1((X)))
    X = F.relu(self.fc2((X)))
    X = self.fc3(X)
    return F.log_softmax(X, dim=1)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(modell.parameters(), lr=0.001)

import time

start_time = time.time()

#VARIABLES  (TRACKER)
epochs = 2
train_losses = []
test_losses = []
train_correct = []
test_correct = []

# FOR LOOP EPOCH
for i in range(epochs):
  trn_corr = 0
  tst_corr = 0

  running_loss = 0.0
  #TRAIN
  for b, (X_train, Y_train, meta) in enumerate(train_loader):
    
    b+=1 #batch starts at 1

    #zero parameter gradients
    optimizer.zero_grad()

    # pass training to model as float (later compute loss)
    output = modell(X_train.float())

    #Calculate the loss of outputs with respect to ground truth values
    loss = criterion(output, Y_train)

    #Backpropagate the loss through the network
    loss.backward()

    #perform parameter update based on the current gradient
    optimizer.step()

    predicted = torch.max(output.data, 1)[1]


    batch_corr = (predicted == Y_train).sum() # True (1) or False (0)
    trn_corr += batch_corr

    running_loss += loss.item()

    if b%2000 == 1999:
      print(f"EPOCH: {i} BATCH: {b} LOSS: {running_loss/2000}")
      running_loss = 0.0

train_losses.append(loss)
train_correct.append(trn_corr)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

∝单色的世界 2025-01-18 00:40:29

至于损失,可能是模型的原因。该模型还有一些改进的空间。仅 2 个卷积层不足以满足您的数据,并且仅扩展到 16 个通道。使用更多具有更多通道的卷积层。例如,5 个卷积层,通道数为 16、32、32、64、64。尝试不同数量的层和通道,看看哪一个最好。此外,Adam 的良好学习率是 3e-4。为了更轻松地跟踪模型进度,我建议减少打印损失的间隔,以便您可以更轻松地跟踪进度。
关于数据,每个类是否有足够的实例?是否标准化在 0 和 1 之间?

As for the loss, it may be due to the model. The model has some rooms for improvement. Only 2 convolution layers is not sufficient for your data, as well as only expanding to 16 channels. Use more convolution layers with more channels. For example, 5 conv layers with channels of 16, 32, 32, 64, 64. Experiment with different numbers of layers and channels to see which one is best. Also, a good learning rate for Adam is 3e-4.To more easily track the models progress, I’d recommend decreasing the interval at which it prints the loss so you can more easily track progress.
About the data, are there enough instances of each class? Is it normalized between 0 and 1?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文