pytorch-序列分类的F.Cross_entropy损失 - 正确的尺寸？

发布于 2025-01-25 22:04:00 字数 1451 浏览 6 评论 0原文

我正在尝试使用变压器编码层的自定义实现执行序列分类。我一直非常忠实地关注本教程： tutorial 。

但是，教程并未证明使用此模型对整个序列进行分类的示例。经过一番搜索，我提出了以下培训功能：

class Pred(TransformerPred):
    def _get_loss(self, batch, mode='train'):
       inp_data, labels = batch
       preds            = self.forward(inp_data, pos_enc=True)
       preds            = torch.mean(preds, dim=1)
       loss             = F.cross_entropy(preds, labels[:, 0])
       acc              = (preds.argmax(dim=-1) == labels[:, 0]).float().mean()
       return loss, acc

   def training_step(self, batch, batch_idx):
       loss, _ = self._get_loss(batch, mode='train')
       return loss

其中

inp_data.size（）=＆gt; TORCH.SIZE（[4，371，1]）

labels.size（）=＆gt; torch.size（[4，2]）

preds.size（）=＆gt; torch.size（[4，371，2]）

当前我正在执行二进制分类，因此在这个小示例中，批次大小为4，序列长度为371，类是2。编码。含义：[1，0]对于类别为0和[0，1]的1类。我的输入的嵌入尺寸为1。我读到F.Cross_entropy损失不一定是二进制分类的最好的想法，但我是计划将其扩展以添加更多类，因此我希望它是通用的。

我的问题是，由于编码器每类输出每个序列输入的值，因此我读到，在尝试对整个序列进行分类时，在序列维度中平均这些值可能很有用。

但是，我观察到的是，当训练是以下值时：张量（[[[0.0863，-0.1591]]，[ - 0.1827，-0.4415]，[-0.0477，-0.2966]，[-0.0477，-0.2966]，即负值和类0始终具有更高的值。这种方法有问题吗？我不确定我了解F.Cross_entropy如何工作以及如何使用变压器编码器执行整个序列的分类。

原文

I am trying to perform sequence classification using a custom implementation of a transformer encoder layer. I have been following this tutorial pretty faithfully:
tutorial.

The tutorial, however, does not demonstrate an example of using this model to classify a whole sequence. After a little bit of searching, I have come up with the following training function:

class Pred(TransformerPred):
    def _get_loss(self, batch, mode='train'):
       inp_data, labels = batch
       preds            = self.forward(inp_data, pos_enc=True)
       preds            = torch.mean(preds, dim=1)
       loss             = F.cross_entropy(preds, labels[:, 0])
       acc              = (preds.argmax(dim=-1) == labels[:, 0]).float().mean()
       return loss, acc

   def training_step(self, batch, batch_idx):
       loss, _ = self._get_loss(batch, mode='train')
       return loss

where

inp_data.size() => torch.Size([4, 371, 1])

labels.size() => torch.Size([4, 2])

preds.size() => torch.Size([4, 371, 2])

Currently I am performing binary classification, so in this small example the batch size is 4, the sequence length is 371 and the classes are 2. The labels are one hot encoded. Meaning: [1, 0] for class 0 and [0, 1] for class 1. My input has an embedding dimension of 1. I have read that F.cross_entropy loss is not necessarily the best idea for binary classification, but I am planning to extend this to add a few more classes, so I want it to be generic.

My question is, since the encoder is outputting a value per sequence input per class, I read that averaging those values in the dimension of the sequence could be useful when trying to do classification of the whole sequence.

What I observe, however, when training are values like: tensor([[ 0.0863, -0.1591],[-0.1827, -0.4415], [-0.0477, -0.2966],[-0.1693, -0.4047]]), i.e negative values and class 0 always having a higher value. Is there something wrong with this approach? I am not sure I understand how F.cross_entropy works and how I should use the transformer encoder to perform classification of a whole sequence.

分享到QQ

分享到微博