批量尺寸的可持续发展目标> 1?

发布于 2025-01-25 10:11:23 字数 999 浏览 5 评论 0原文

我正在IBM录制“带有Pytorch的Deep NNS”课程,并遇到了实验室示例,其中可持续发展目标用于优化器,而批处理大小为DataLoader中。

如果我正确理解,SGD将在每个步骤中只有1个训练示例执行梯度下降,那么SGD将如何与每批培训示例进行相互作用?

例如,如果批处理大小= 20,SGD优化器会在每批中执行20 GD步骤吗?如果是这种情况,那么这是否意味着无论我为数据加载器设置了什么批次大小,SGD优化器都会在一个时期内执行(训练示例)GD步骤?

Layers = [2, 50, 3]
model = Net(Layers)
learning_rate = 0.10
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
train_loader = DataLoader(dataset=data_set, batch_size=20)
criterion = nn.CrossEntropyLoss()
LOSS = train(data_set, model, criterion, train_loader, optimizer, epochs=100)

def train(data_set, model, criterion, train_loader, optimizer, epochs=100):
LOSS = []
ACC = []
for epoch in range(epochs):
    for x, y in train_loader:
        print(x, y)
        optimizer.zero_grad()
        yhat = model(x)
        loss = criterion(yhat, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        LOSS.append(loss.item())
    ACC.append(accuracy(model, data_set))
    ...

I'm taking the "Deep NNs with PyTorch" course by IBM and I encountered lab examples where SDG is used for optimizer while batch size is >1 in DataLoader.

If I understand correctly, SGD would perform gradient descent with only 1 training example in each step, so it this case how would the SGD interact with each batch of training example?

For example, if batch size = 20, would the SGD optimizer perform 20 GD steps in each batch? If this is the case, then does that mean no matter what batch size I set for DataLoader, the SGD optimizer would just perform (# of training example) GD steps in one epoch?

Layers = [2, 50, 3]
model = Net(Layers)
learning_rate = 0.10
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
train_loader = DataLoader(dataset=data_set, batch_size=20)
criterion = nn.CrossEntropyLoss()
LOSS = train(data_set, model, criterion, train_loader, optimizer, epochs=100)

def train(data_set, model, criterion, train_loader, optimizer, epochs=100):
LOSS = []
ACC = []
for epoch in range(epochs):
    for x, y in train_loader:
        print(x, y)
        optimizer.zero_grad()
        yhat = model(x)
        loss = criterion(yhat, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        LOSS.append(loss.item())
    ACC.append(accuracy(model, data_set))
    ...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

我喜欢麦丽素 2025-02-01 10:11:23

如果批处理大小= 20,SGD优化器将执行20 GD步骤
每批?

否。批次尺寸= 20表示,它将处理所有20个样本,然后获得标量损失。基于此,它将回到错误。那是GD的一步。

它被称为 minibatch sgd ,而不是像在SGD中这样的1输入,而是考虑20,然后其他所有内容保持不变。

if batch size = 20, would the SGD optimizer perform 20 GD steps in
each batch?

No. Batch size = 20 means, it would process all the 20 samples and then get the scalar loss. Based on that it would backpropagate the error. And that is one step of GD.

This is known as minibatch SGD, instead of taking 1 input like in SGD, it considers 20 and then everything else stays the same.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文