多标签分类的损失
我正在处理多标签分类问题。我的GT标签是形状14 x 10 x 128
,其中14
是batch_size
,10
是<代码> sequence_length ,128
是具有值1
的向量,如果序列中的项目属于对象,0
否则。
我的输出也具有相同的形状:14 x 10 x 128
。由于,我的输入序列的长度有所不同,因此我必须将其填充以使其具有固定的长度10
。我试图找到模型的丢失,如下所示:
total_loss = 0.0
unpadded_seq_lengths = [3, 4, 5, 7, 9, 3, 2, 8, 5, 3, 5, 7, 7, ...] # true lengths of sequences
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.BCEWithLogitsLoss()
for data in training_dataloader:
optimizer.zero_grad()
# shape of input 14 x 10 x 128
output = model(data)
batch_loss = 0.0
for batch_idx, sequence in enumerate(output):
# sequence shape is 10 x 128
true_seq_len = unpadded_seq_lengths[batch_idx]
# only keep unpadded gt and predicted labels since we don't want loss to be influenced by padded values
predicted_labels = sequence[:true_seq_len, :] # for example, 3 x 128
gt_labels = gt_labels_padded[batch_idx, :true_seq_len, :] # same shape as above, gt_labels_padded has shape 14 x 10 x 128
# loop through unpadded predicted and gt labels and calculate loss
for item_idx, predicted_labels_seq_item in enumerate(predicted_labels):
# predicted_labels_seq_item and gt_labels_seq_item are 1D vectors of length 128
gt_labels_seq_item = gt_labels[item_idx]
current_loss = criterion(predicted_labels_seq_item, gt_labels_seq_item)
total_loss += current_loss
batch_loss += current_loss
batch_loss.backward()
optimizer.step()
任何人都可以检查我是否正确计算损失。感谢
更新:
这是计算准确度指标的正确方法吗?
# batch size: 14
# seq length: 10
for epoch in range(10):
TP = FP = TN = FN = 0.
for x, y, mask in tr_dl:
# mask shape: (10,)
out = model(x) # out shape: (14, 10, 128)
y_pred = (torch.sigmoid(out) >= 0.5).float().type(torch.int64) # consider all predictions above 0.5 as 1, rest 0
y_pred = y_pred[mask] # y_pred shape: (14, 10, 10, 128)
y_labels = y[mask] # y_labels shape: (14, 10, 10, 128)
# do I flatten y_pred and y_labels?
y_pred = y_pred.flatten()
y_labels = y_labels.flatten()
for idx, prediction in enumerate(y_pred):
if prediction == 1 and y_labels[idx] == 1:
# calculate IOU (overlap of prediction and gt bounding box)
iou = 0.78 # assume we get this iou value for objects at idx
if iou >= 0.5:
TP += 1
else:
FP += 1
elif prediction == 1 and y_labels[idx] == 0:
FP += 1
elif prediction == 0 and y_labels[idx] == 1:
FN += 1
else:
TN += 1
EPOCH_ACC = (TP + TN) / (TP + TN + FP + FN)
I am working on a multi-label classification problem. My gt labels are of shape 14 x 10 x 128
, where 14
is the batch_size
, 10
is the sequence_length
, and 128
is the vector with values 1
if the item in sequence belongs to the object and 0
otherwise.
My output is also of same shape: 14 x 10 x 128
. Since, my input sequence was of varying length I had to pad it to make it of fixed length 10
. I'm trying to find the loss of the model as follows:
total_loss = 0.0
unpadded_seq_lengths = [3, 4, 5, 7, 9, 3, 2, 8, 5, 3, 5, 7, 7, ...] # true lengths of sequences
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.BCEWithLogitsLoss()
for data in training_dataloader:
optimizer.zero_grad()
# shape of input 14 x 10 x 128
output = model(data)
batch_loss = 0.0
for batch_idx, sequence in enumerate(output):
# sequence shape is 10 x 128
true_seq_len = unpadded_seq_lengths[batch_idx]
# only keep unpadded gt and predicted labels since we don't want loss to be influenced by padded values
predicted_labels = sequence[:true_seq_len, :] # for example, 3 x 128
gt_labels = gt_labels_padded[batch_idx, :true_seq_len, :] # same shape as above, gt_labels_padded has shape 14 x 10 x 128
# loop through unpadded predicted and gt labels and calculate loss
for item_idx, predicted_labels_seq_item in enumerate(predicted_labels):
# predicted_labels_seq_item and gt_labels_seq_item are 1D vectors of length 128
gt_labels_seq_item = gt_labels[item_idx]
current_loss = criterion(predicted_labels_seq_item, gt_labels_seq_item)
total_loss += current_loss
batch_loss += current_loss
batch_loss.backward()
optimizer.step()
Can anybody please check to see if I'm calculating loss correctly. Thanks
Update:
Is this the correct approach for calculating accuracy metrics?
# batch size: 14
# seq length: 10
for epoch in range(10):
TP = FP = TN = FN = 0.
for x, y, mask in tr_dl:
# mask shape: (10,)
out = model(x) # out shape: (14, 10, 128)
y_pred = (torch.sigmoid(out) >= 0.5).float().type(torch.int64) # consider all predictions above 0.5 as 1, rest 0
y_pred = y_pred[mask] # y_pred shape: (14, 10, 10, 128)
y_labels = y[mask] # y_labels shape: (14, 10, 10, 128)
# do I flatten y_pred and y_labels?
y_pred = y_pred.flatten()
y_labels = y_labels.flatten()
for idx, prediction in enumerate(y_pred):
if prediction == 1 and y_labels[idx] == 1:
# calculate IOU (overlap of prediction and gt bounding box)
iou = 0.78 # assume we get this iou value for objects at idx
if iou >= 0.5:
TP += 1
else:
FP += 1
elif prediction == 1 and y_labels[idx] == 0:
FP += 1
elif prediction == 0 and y_labels[idx] == 1:
FN += 1
else:
TN += 1
EPOCH_ACC = (TP + TN) / (TP + TN + FP + FN)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
通常建议在主训练循环中避免进行批处理操作,并避免进入单一元素处理步骤。处理这种情况的一种方法是使您的数据集返回填充输入和标签以及一个对损失计算有用的掩码。换句话说,要使用不同大小的序列计算损失项,我们将使用掩码而不是进行单个切片。
数据集的
进行方式是确保您在数据集中而不是推理循环中构建掩码。在这里,我展示了一个最小的实现,您应该能够在没有太多麻烦的情况下转移到数据集:
从本质上讲,在
__ __ getItem __
中,我们不仅在输入x
和target <代码> y 具有零值,我们还构建了一个简单的掩码,该掩码包含当前处理元素中填充值的位置。请注意:
x_padded
,形状(seq_len,emb_size)
y__padded
,形状(seq_len,n_classes)
mask
,形状(seq_len,)
都是在数据集中形状不变的所有三个张量,但
mask
包含我们计算损失所需的填充信息功能适当。推断
您使用 是正确的,因为它是用于二进制分类的多维损失。换句话说,您可以在此多标签分类任务中使用它,将128个logitt中的每一个都视为单个二进制预测。请勿使用
nn.crossentropyloss
)如其他地方所建议的,因为 softmax 将推动单个logit( ie 类),这是单标签分类所需的行为任务。
因此,在训练循环中,我们只需要将面具应用于损失。
It is usually recommended to stick with batch-wise operations and avoid going into single-element processing steps while in the main training loop. One way to handle this case is to make your dataset return padded inputs and labels with additionally a mask that will come useful for loss computation. In other words, to compute the loss term with sequences of varying sizes, we will use a mask instead of doing individual slices.
Dataset
The way to proceed is to make sure you build the mask in the dataset and not in the inference loop. Here I am showing a minimal implementation that you should be able to transfer to your dataset without much hassle:
Essentially in the
__getitem__
, we not only pad the inputx
and targety
with zero values, we also construct a simple mask containing the positions of the padded values in the currently processed element.Notice how:
x_padded
, shaped(SEQ_LEN, EMB_SIZE)
y_padded
, shaped(SEQ_LEN, N_CLASSES)
mask
, shaped(SEQ_LEN,)
are all three tensors which are shape invariant across the dataset, yet
mask
contains the padding information necessary for us to compute the loss function appropriately.Inference
The loss you've used
nn.BCEWithLogitsLoss
, is the correct one since it's a multi-dimensional loss used for binary classification. In other words, you can use it here in this multi-label classification task, considering each one of the 128 logits as an individual binary prediction. Do not usenn.CrossEntropyLoss
) as suggested elsewhere, since the softmax will push a single logit (i.e. class), which is the behaviour required for single-label classification tasks.Therefore, in the training loop, we simply have to apply the mask to our loss.
这就是问题的第一部分所需的,在TensorFlow中已经实现了损失功能: https://medium.com/@aadityaura_26777/the-loss-function-function-for-multi-multi-multi-multi-multi-multi-multi-and-multi-class--multi-class-fclass-class-f955cae52525252525 。您的只是
tf.nn.weighted_cross_entropy_with_logits
,但是您需要设置权重。问题的第二部分并不简单,因为在IOU上有条件,通常,当您进行机器学习时,您应该在很大程度上依赖矩阵乘法,在您的情况下,您可能需要预先计算IOU-&gt; 1或0作为向量,然后用
y_pred
(元素)繁殖,这将为您提供修改的y_pred
。之后,您可以使用任何准确性可用功能来计算最终结果。This is what you need for the first part of the question, there are already loss functions implemented in tensorflow: https://medium.com/@aadityaura_26777/the-loss-function-for-multi-label-and-multi-class-f68f95cae525. Yours is just
tf.nn.weighted_cross_entropy_with_logits
, but you need to set the weight.The second part of the question is not straightforward, because there's conditioning on the IOU, generally, when you do machine learning, you should heavily depend on matrix multiplication, in your case, you probably need to pre-calculate the IOU -> 1 or 0 as a vector, then multiply with the
y_pred
, element-wise, this will give you the modifiedy_pred
. After that, you can use any accuracy available function to calculate the final result.如果您可以使用“ nofollow noreferrer”> crossentropyloss 而不是
bcewithlogitsloss
有一种叫做ignore_index
的东西。您可以使用它来排除带衬垫的序列。两种损失之间的区别是使用的激活函数(softmax
vssigmoid
)。但是我认为您仍然可以将Crossentropyloss
用于二进制分类。if you can use the CROSSENTROPYLOSS instead of
BCEWithLogitsLoss
there is something calledignore_index
. you can use it to exclude your padded sequences. the difference between the 2 losses is the activation function used (softmax
vssigmoid
). but I think you can still use theCROSSENTROPYLOSS
for binary classification as well.