一旦将批量设置为&gt,网络就会停止学习。 1

发布于 2025-01-19 11:14:53 字数 2342 浏览 6 评论 0 原文

我今天开始从 Keras 切换到 Pytorch,并尝试了一些简单的前馈网络。它应该学习平方运算,即 f(x) = x^2。然而,如果我将批量大小设置为 1,我的网络只能合理地学习。任何其他批量大小都会产生非常差的结果。我还尝试了 1 到 0.0001 之间的不同学习率,看看这是否能以某种方式修复它,并且还测试了对网络的一些更改,但无济于事。谁能告诉我我做错了什么,即为什么一旦我将批量大小设置为大于 1 的任何值,我的网络就无法学习?下面找到一个最小的工作示例。感谢您的帮助!

import numpy as np
from random import randint
import random
import time
from multiprocessing import Pool
import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader
from torchvision import datasets, transforms

class SquareDataset(Dataset):
     def __init__(self, num_samples):
         super(Dataset, self).__init__()
         self.num_samples = num_samples
         self.train  = [None] * num_samples
         self.target = [None] * num_samples
         
         for i in range(0, num_samples):
             self.train[i]  = random.random() * randint(1, 10)
             self.target[i] =  self.train[i] ** 2
             
     def __len__(self):
         return self.num_samples
        
     def __getitem__(self, index):
        return self.train[index], self.target[index]



def trainNetwork(epochs=50):
    data_train = SquareDataset(num_samples=1000)
    data_train_loader = DataLoader(data_train, batch_size=1, shuffle=False)

    model = nn.Sequential(nn.Linear(1, 32),
                      nn.ReLU(),
                      nn.Linear(32, 32),
                      nn.ReLU(),
                      nn.Linear(32, 1))
    # Define the loss
    criterion = nn.MSELoss()
    # Optimizers require the parameters to optimize and a learning rate
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
    for e in range(epochs):
        running_loss = 0
        for number, labels in data_train_loader:
            optimizer.zero_grad()
            number = number.view(number.size(0), -1)
            output = model(number.float())
            loss = criterion(output, labels.float())
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
        else:
            print(f"Training loss: {running_loss/len(data_train_loader)}")
    # some test outputs
    sample = torch.tensor([0.2])
    out = model(sample.float())
    print("Out:")
    print(out.item())
    sample = torch.tensor([1])
    out = model(sample.float())
    print("Out:")
    print(out.item())

trainNetwork()

I started switching from Keras to Pytorch and played around with some simple feedforward network today. It is supposed to learn the squaring operation, i.e. f(x) = x^2. However, my network only learns reasonably if I set the batchsize to 1. Any other batchsize yields very poor results. I tried also different learning rates between 1 and 0.0001 to see if this somehow fixed it and also tested a few changes to the network but to no avail. Could anyone tell me what I am doing wrong, i.e. why does my network not learn once I set the batchsize to any value above 1? Find a minimal working example below. Thank you for your help!

import numpy as np
from random import randint
import random
import time
from multiprocessing import Pool
import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader
from torchvision import datasets, transforms

class SquareDataset(Dataset):
     def __init__(self, num_samples):
         super(Dataset, self).__init__()
         self.num_samples = num_samples
         self.train  = [None] * num_samples
         self.target = [None] * num_samples
         
         for i in range(0, num_samples):
             self.train[i]  = random.random() * randint(1, 10)
             self.target[i] =  self.train[i] ** 2
             
     def __len__(self):
         return self.num_samples
        
     def __getitem__(self, index):
        return self.train[index], self.target[index]



def trainNetwork(epochs=50):
    data_train = SquareDataset(num_samples=1000)
    data_train_loader = DataLoader(data_train, batch_size=1, shuffle=False)

    model = nn.Sequential(nn.Linear(1, 32),
                      nn.ReLU(),
                      nn.Linear(32, 32),
                      nn.ReLU(),
                      nn.Linear(32, 1))
    # Define the loss
    criterion = nn.MSELoss()
    # Optimizers require the parameters to optimize and a learning rate
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
    for e in range(epochs):
        running_loss = 0
        for number, labels in data_train_loader:
            optimizer.zero_grad()
            number = number.view(number.size(0), -1)
            output = model(number.float())
            loss = criterion(output, labels.float())
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
        else:
            print(f"Training loss: {running_loss/len(data_train_loader)}")
    # some test outputs
    sample = torch.tensor([0.2])
    out = model(sample.float())
    print("Out:")
    print(out.item())
    sample = torch.tensor([1])
    out = model(sample.float())
    print("Out:")
    print(out.item())

trainNetwork()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

巷子口的你 2025-01-26 11:14:53

在线 loss = criterion(output,labels.float())首先张量具有shape (batch_size,1) labels 具有Shape (batch_size,)。因此,当 batch_size> 1 广播发生,这导致了错误的目标,类似于的情况类似。要克服问题重写损失线,但具有相等的形状,例如:

loss = criterion(output.squeeze(-1), labels.float())

On line loss = criterion(output, labels.float()) first tensor has shape (batch_size, 1) while labels has shape (batch_size, ). Hence when batch_size > 1 broadcasting occurs and this lead to wrong objective, case similar to this. To overcome issue rewrite loss line but with equal shapes, like:

loss = criterion(output.squeeze(-1), labels.float())
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文