编写自定义损失功能时该怎么看？

发布于 2025-02-06 11:09:15 字数 2995 浏览 2 评论 0原文

我正在尝试让我的玩具网络学习正弦波。

我（通过tanh）在-1和1之间输出一个数字，我希望网络能够最大程度地减少以下损失，其中self（x）是预测。

loss = -torch.mean(self(x)*y)

这应该等同于用正弦价格交易股票，其中self（x）是我们所需的位置，y是下一个时间步骤的回报。

我遇到的问题是网络什么都没学。如果我将损失函数更改为torch.mean（（self（x）-y）** 2）（MSE），它确实有效，但这不是我想要的。我试图将网络集中在“赚钱”上，而不是做出预测。

我认为问题可能与损失功能的凸性有关，但我不确定，我不确定如何进行。我已经尝试了不同的学习率，但是可惜没有任何可行。

我应该想什么？

实际代码：

%load_ext tensorboard
import matplotlib.pyplot as plt; plt.rcParams["figure.figsize"] = (30,8)
import torch;from torch.utils.data import Dataset, DataLoader
import torch.nn.functional as F;import pytorch_lightning as pl
from torch import nn, tensor
def piecewise(x): return 2*(x>0)-1

class TsDs(torch.utils.data.Dataset):
  def __init__(self, s, l=5): super().__init__();self.l,self.s=l,s
  def __len__(self): return self.s.shape[0] - 1 - self.l
  def __getitem__(self, i): return self.s[i:i+self.l], torch.log(self.s[i+self.l+1]/self.s[i+self.l])
  def plt(self): plt.plot(self.s)

class TsDm(pl.LightningDataModule):
  def __init__(self, length=5000, batch_size=1000): super().__init__();self.batch_size=batch_size;self.s = torch.sin(torch.arange(length)*0.2) + 5 + 0*torch.rand(length)
  def train_dataloader(self): return DataLoader(TsDs(self.s[:3999]), batch_size=self.batch_size, shuffle=True)
  def val_dataloader(self): return DataLoader(TsDs(self.s[4000:]), batch_size=self.batch_size)

dm = TsDm()

class MyModel(pl.LightningModule):
    def __init__(self, learning_rate=0.01):
        super().__init__();self.learning_rate = learning_rate
        super().__init__();self.learning_rate = learning_rate
        self.conv1 = nn.Conv1d(1,5,2)
        self.lin1 = nn.Linear(20,3);self.lin2 = nn.Linear(3,1)
        # self.network = nn.Sequential(nn.Conv1d(1,5,2),nn.ReLU(),nn.Linear(20,3),nn.ReLU(),nn.Linear(3,1), nn.Tanh())
        # self.network = nn.Sequential(nn.Linear(5,5),nn.ReLU(),nn.Linear(5,3),nn.ReLU(),nn.Linear(3,1), nn.Tanh())
    def forward(self, x): 
        out = x.unsqueeze(1)
        out = self.conv1(out)
        out = out.reshape(-1,20)
        out = nn.ReLU()(out)
        out = self.lin1(out)
        out = nn.ReLU()(out)
        out = self.lin2(out)
        return nn.Tanh()(out)

    def step(self, batch, batch_idx, stage):
        x, y = batch
        loss = -torch.mean(self(x)*y)
        # loss = torch.mean((self(x)-y)**2)
        print(loss)
        self.log("loss", loss, prog_bar=True)
        return loss
    def training_step(self, batch, batch_idx): return self.step(batch, batch_idx, "train")
    def validation_step(self, batch, batch_idx): return self.step(batch, batch_idx, "val")
    def configure_optimizers(self): return torch.optim.SGD(self.parameters(), lr=self.learning_rate)

#logger = pl.loggers.TensorBoardLogger(save_dir="/content/")
mm = MyModel(0.1);trainer = pl.Trainer(max_epochs=10)
# trainer.tune(mm, dm)
trainer.fit(mm, datamodule=dm)
#

原文

I'm trying to get my toy network to learn a sine wave.

I output (via tanh) a number between -1 and 1, and I want the network to minimise the following loss, where self(x) are the predictions.

loss = -torch.mean(self(x)*y)

This should be equivalent to trading a stock with a sinusoidal price, where self(x) is our desired position, and y are the returns of the next time step.

The issue I'm having is that the network doesn't learn anything. It does work if I change the loss function to be torch.mean((self(x)-y)**2) (MSE), but this isn't what I want. I'm trying to focus the network on 'making a profit', not making a prediction.

I think the issue may be related to the convexity of the loss function, but I'm not sure, and I'm not certain how to proceed. I've experimented with differing learning rates, but alas nothing works.

What should I be thinking about?

Actual code:

%load_ext tensorboard
import matplotlib.pyplot as plt; plt.rcParams["figure.figsize"] = (30,8)
import torch;from torch.utils.data import Dataset, DataLoader
import torch.nn.functional as F;import pytorch_lightning as pl
from torch import nn, tensor
def piecewise(x): return 2*(x>0)-1

class TsDs(torch.utils.data.Dataset):
  def __init__(self, s, l=5): super().__init__();self.l,self.s=l,s
  def __len__(self): return self.s.shape[0] - 1 - self.l
  def __getitem__(self, i): return self.s[i:i+self.l], torch.log(self.s[i+self.l+1]/self.s[i+self.l])
  def plt(self): plt.plot(self.s)

class TsDm(pl.LightningDataModule):
  def __init__(self, length=5000, batch_size=1000): super().__init__();self.batch_size=batch_size;self.s = torch.sin(torch.arange(length)*0.2) + 5 + 0*torch.rand(length)
  def train_dataloader(self): return DataLoader(TsDs(self.s[:3999]), batch_size=self.batch_size, shuffle=True)
  def val_dataloader(self): return DataLoader(TsDs(self.s[4000:]), batch_size=self.batch_size)

dm = TsDm()

class MyModel(pl.LightningModule):
    def __init__(self, learning_rate=0.01):
        super().__init__();self.learning_rate = learning_rate
        super().__init__();self.learning_rate = learning_rate
        self.conv1 = nn.Conv1d(1,5,2)
        self.lin1 = nn.Linear(20,3);self.lin2 = nn.Linear(3,1)
        # self.network = nn.Sequential(nn.Conv1d(1,5,2),nn.ReLU(),nn.Linear(20,3),nn.ReLU(),nn.Linear(3,1), nn.Tanh())
        # self.network = nn.Sequential(nn.Linear(5,5),nn.ReLU(),nn.Linear(5,3),nn.ReLU(),nn.Linear(3,1), nn.Tanh())
    def forward(self, x): 
        out = x.unsqueeze(1)
        out = self.conv1(out)
        out = out.reshape(-1,20)
        out = nn.ReLU()(out)
        out = self.lin1(out)
        out = nn.ReLU()(out)
        out = self.lin2(out)
        return nn.Tanh()(out)

    def step(self, batch, batch_idx, stage):
        x, y = batch
        loss = -torch.mean(self(x)*y)
        # loss = torch.mean((self(x)-y)**2)
        print(loss)
        self.log("loss", loss, prog_bar=True)
        return loss
    def training_step(self, batch, batch_idx): return self.step(batch, batch_idx, "train")
    def validation_step(self, batch, batch_idx): return self.step(batch, batch_idx, "val")
    def configure_optimizers(self): return torch.optim.SGD(self.parameters(), lr=self.learning_rate)

#logger = pl.loggers.TensorBoardLogger(save_dir="/content/")
mm = MyModel(0.1);trainer = pl.Trainer(max_epochs=10)
# trainer.tune(mm, dm)
trainer.fit(mm, datamodule=dm)
#

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

狠疯拽 2025-02-13 11:09:15

如果我正确理解您，我认为您正在尝试最大程度地提高网络预测，self（x）和目标值y之间的不差异相关性。

正如您提到的那样，问题是模型权重的损失的凸度。查看问题的一种方法是考虑模型是一个简单的线性预测器w'*x，其中w是模型权重，w'是转置，x输入特征向量向量（暂时假设标量预测）。然后，如果您查看损失wrt的导数（即渐变），您会发现它不再取决于w！

解决此问题的一种方法是将损失更改为，

loss = -torch.mean(torch.square(self(x)*y))

否则

loss = -torch.mean(torch.abs(self(x)*y))

您将遇到另一个大问题：这些损失功能会鼓励模型权重的未结合增长。在线性情况下，一个人通过lagrangian放松的硬性约束来解决此问题，例如，模型权重向量的规范。我不确定如何使用神经网络完成，因为每一层都需要自己的拉格朗日参数...

If I understand you correctly, I think that you were trying to maximize the unnormalized correlation between the network's prediction, self(x), and the target value y.

As you mention, the problem is the convexity of the loss wrt the model weights. One way to see the problem is to consider that the model is a simple linear predictor w'*x, where w is the model weights, w' it's transpose, and x the input feature vector (assume a scalar prediction for now). Then, if you look at the derivative of the loss wrt the weight vector (i.e., the gradient), you'll find that it no longer depends on w!

One way to fix this is change the loss to,

loss = -torch.mean(torch.square(self(x)*y))

loss = -torch.mean(torch.abs(self(x)*y))

You will have another big problem, however: these loss functions encourage unbound growth of the model weights. In the linear case, one solves this by a Lagrangian relaxation of a hard constraint on, for example, the norm of the model weight vector. I'm not sure how this would be done with neural networks as each layer would need it's own Lagrangian parameter...

回复收藏 0 原文

~没有更多了~