pytorch-序列分类的F.Cross_entropy损失 - 正确的尺寸?
我正在尝试使用变压器编码层的自定义实现执行序列分类。我一直非常忠实地关注本教程: tutorial 。
但是,教程并未证明使用此模型对整个序列进行分类的示例。经过一番搜索,我提出了以下培训功能:
class Pred(TransformerPred):
def _get_loss(self, batch, mode='train'):
inp_data, labels = batch
preds = self.forward(inp_data, pos_enc=True)
preds = torch.mean(preds, dim=1)
loss = F.cross_entropy(preds, labels[:, 0])
acc = (preds.argmax(dim=-1) == labels[:, 0]).float().mean()
return loss, acc
def training_step(self, batch, batch_idx):
loss, _ = self._get_loss(batch, mode='train')
return loss
其中
inp_data.size()=> TORCH.SIZE([4,371,1])
labels.size()=> torch.size([4,2])
preds.size()=> torch.size([4,371,2])
当前我正在执行二进制分类,因此在这个小示例中,批次大小为4,序列长度为371,类是2。编码。含义:[1,0]对于类别为0和[0,1]的1类。我的输入的嵌入尺寸为1。我读到F.Cross_entropy损失不一定是二进制分类的最好的想法,但我是计划将其扩展以添加更多类,因此我希望它是通用的。
我的问题是,由于编码器每类输出每个序列输入的值,因此我读到,在尝试对整个序列进行分类时,在序列维度中平均这些值可能很有用。
但是,我观察到的是,当训练是以下值时:张量([[[0.0863,-0.1591]],[ - 0.1827,-0.4415],[-0.0477,-0.2966],[-0.0477,-0.2966]
,即负值和类0始终具有更高的值。这种方法有问题吗?我不确定我了解F.Cross_entropy如何工作以及如何使用变压器编码器执行整个序列的分类。
I am trying to perform sequence classification using a custom implementation of a transformer encoder layer. I have been following this tutorial pretty faithfully:
tutorial.
The tutorial, however, does not demonstrate an example of using this model to classify a whole sequence. After a little bit of searching, I have come up with the following training function:
class Pred(TransformerPred):
def _get_loss(self, batch, mode='train'):
inp_data, labels = batch
preds = self.forward(inp_data, pos_enc=True)
preds = torch.mean(preds, dim=1)
loss = F.cross_entropy(preds, labels[:, 0])
acc = (preds.argmax(dim=-1) == labels[:, 0]).float().mean()
return loss, acc
def training_step(self, batch, batch_idx):
loss, _ = self._get_loss(batch, mode='train')
return loss
where
inp_data.size() => torch.Size([4, 371, 1])
labels.size() => torch.Size([4, 2])
preds.size() => torch.Size([4, 371, 2])
Currently I am performing binary classification, so in this small example the batch size is 4, the sequence length is 371 and the classes are 2. The labels are one hot encoded. Meaning: [1, 0] for class 0 and [0, 1] for class 1. My input has an embedding dimension of 1. I have read that F.cross_entropy loss is not necessarily the best idea for binary classification, but I am planning to extend this to add a few more classes, so I want it to be generic.
My question is, since the encoder is outputting a value per sequence input per class, I read that averaging those values in the dimension of the sequence could be useful when trying to do classification of the whole sequence.
What I observe, however, when training are values like: tensor([[ 0.0863, -0.1591],[-0.1827, -0.4415], [-0.0477, -0.2966],[-0.1693, -0.4047]])
, i.e negative values and class 0 always having a higher value. Is there something wrong with this approach? I am not sure I understand how F.cross_entropy works and how I should use the transformer encoder to perform classification of a whole sequence.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论