多标签分类的损失

发布于 2025-02-07 11:14:27 字数 3120 浏览 2 评论 0原文

我正在处理多标签分类问题。我的GT标签是形状14 x 10 x 128，其中14是batch_size，10是<代码> sequence_length ，128是具有值1的向量，如果序列中的项目属于对象，0否则。

我的输出也具有相同的形状：14 x 10 x 128。由于，我的输入序列的长度有所不同，因此我必须将其填充以使其具有固定的长度10。我试图找到模型的丢失，如下所示：

total_loss = 0.0
unpadded_seq_lengths = [3, 4, 5, 7, 9, 3, 2, 8, 5, 3, 5, 7, 7, ...] # true lengths of sequences

optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.BCEWithLogitsLoss()

for data in training_dataloader:
    optimizer.zero_grad()

    # shape of input 14 x 10 x 128
    output = model(data)
   
    batch_loss = 0.0
    for batch_idx, sequence in enumerate(output):
        # sequence shape is 10 x 128
        true_seq_len = unpadded_seq_lengths[batch_idx]
        
        # only keep unpadded gt and predicted labels since we don't want loss to be influenced by padded values
        predicted_labels = sequence[:true_seq_len, :] # for example, 3 x 128
        gt_labels =  gt_labels_padded[batch_idx, :true_seq_len, :] # same shape as above, gt_labels_padded has shape 14 x 10 x 128
        
        # loop through unpadded predicted and gt labels and calculate loss  
        for item_idx, predicted_labels_seq_item in enumerate(predicted_labels):
            
            # predicted_labels_seq_item and gt_labels_seq_item are 1D vectors of length 128
            gt_labels_seq_item = gt_labels[item_idx]
            current_loss = criterion(predicted_labels_seq_item, gt_labels_seq_item)                     
            total_loss += current_loss
            batch_loss += current_loss


    batch_loss.backward()
    optimizer.step()

任何人都可以检查我是否正确计算损失。感谢

更新：

这是计算准确度指标的正确方法吗？

# batch size: 14
# seq length: 10

for epoch in range(10):
    TP = FP = TN = FN = 0.
    for x, y, mask in tr_dl:
        # mask shape: (10,)
        out = model(x) # out shape: (14, 10, 128)
        y_pred = (torch.sigmoid(out) >= 0.5).float().type(torch.int64) # consider all predictions above 0.5 as 1, rest 0
        y_pred = y_pred[mask] # y_pred shape: (14, 10, 10, 128)
        y_labels = y[mask] # y_labels shape: (14, 10, 10, 128)

        # do I flatten y_pred and y_labels?
        y_pred = y_pred.flatten()
        y_labels = y_labels.flatten()
        
        for idx, prediction in enumerate(y_pred):
            if prediction == 1 and y_labels[idx] == 1:
                # calculate IOU (overlap of prediction and gt bounding box)
                iou = 0.78 # assume we get this iou value for objects at idx
                if iou >= 0.5:
                    TP += 1
                else:
                    FP += 1
            elif prediction == 1 and y_labels[idx] == 0:
                FP += 1
            elif prediction == 0 and y_labels[idx] == 1:
                FN += 1
            else:
                TN += 1
          
     EPOCH_ACC = (TP + TN) / (TP + TN + FP + FN)

原文

I am working on a multi-label classification problem. My gt labels are of shape 14 x 10 x 128, where 14 is the batch_size, 10 is the sequence_length, and 128 is the vector with values 1 if the item in sequence belongs to the object and 0 otherwise.

My output is also of same shape: 14 x 10 x 128. Since, my input sequence was of varying length I had to pad it to make it of fixed length 10. I'm trying to find the loss of the model as follows:

total_loss = 0.0
unpadded_seq_lengths = [3, 4, 5, 7, 9, 3, 2, 8, 5, 3, 5, 7, 7, ...] # true lengths of sequences

optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.BCEWithLogitsLoss()

for data in training_dataloader:
    optimizer.zero_grad()

    # shape of input 14 x 10 x 128
    output = model(data)
   
    batch_loss = 0.0
    for batch_idx, sequence in enumerate(output):
        # sequence shape is 10 x 128
        true_seq_len = unpadded_seq_lengths[batch_idx]
        
        # only keep unpadded gt and predicted labels since we don't want loss to be influenced by padded values
        predicted_labels = sequence[:true_seq_len, :] # for example, 3 x 128
        gt_labels =  gt_labels_padded[batch_idx, :true_seq_len, :] # same shape as above, gt_labels_padded has shape 14 x 10 x 128
        
        # loop through unpadded predicted and gt labels and calculate loss  
        for item_idx, predicted_labels_seq_item in enumerate(predicted_labels):
            
            # predicted_labels_seq_item and gt_labels_seq_item are 1D vectors of length 128
            gt_labels_seq_item = gt_labels[item_idx]
            current_loss = criterion(predicted_labels_seq_item, gt_labels_seq_item)                     
            total_loss += current_loss
            batch_loss += current_loss


    batch_loss.backward()
    optimizer.step()

Can anybody please check to see if I'm calculating loss correctly. Thanks

Update:

Is this the correct approach for calculating accuracy metrics?

# batch size: 14
# seq length: 10

for epoch in range(10):
    TP = FP = TN = FN = 0.
    for x, y, mask in tr_dl:
        # mask shape: (10,)
        out = model(x) # out shape: (14, 10, 128)
        y_pred = (torch.sigmoid(out) >= 0.5).float().type(torch.int64) # consider all predictions above 0.5 as 1, rest 0
        y_pred = y_pred[mask] # y_pred shape: (14, 10, 10, 128)
        y_labels = y[mask] # y_labels shape: (14, 10, 10, 128)

        # do I flatten y_pred and y_labels?
        y_pred = y_pred.flatten()
        y_labels = y_labels.flatten()
        
        for idx, prediction in enumerate(y_pred):
            if prediction == 1 and y_labels[idx] == 1:
                # calculate IOU (overlap of prediction and gt bounding box)
                iou = 0.78 # assume we get this iou value for objects at idx
                if iou >= 0.5:
                    TP += 1
                else:
                    FP += 1
            elif prediction == 1 and y_labels[idx] == 0:
                FP += 1
            elif prediction == 0 and y_labels[idx] == 1:
                FN += 1
            else:
                TN += 1
          
     EPOCH_ACC = (TP + TN) / (TP + TN + FP + FN)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

清风夜微凉 2025-02-14 11:14:27

通常建议在主训练循环中避免进行批处理操作，并避免进入单一元素处理步骤。处理这种情况的一种方法是使您的数据集返回填充输入和标签以及一个对损失计算有用的掩码。换句话说，要使用不同大小的序列计算损失项，我们将使用掩码而不是进行单个切片。

数据集的

进行方式是确保您在数据集中而不是推理循环中构建掩码。在这里，我展示了一个最小的实现，您应该能够在没有太多麻烦的情况下转移到数据集：

class Dataset(data.Dataset):
    def __init__(self):
        super().__init__()

    def __len__(self):
        return 100

    def __getitem__(self, index):
        i = random.randint(5, SEQ_LEN) # for demo puporse, generate x with random length
        x = torch.rand(i, EMB_SIZE)
        y = torch.randint(0, N_CLASSES, (i, EMB_SIZE))

        # pad data to fit in batch
        pad = torch.zeros(SEQ_LEN-len(x), EMB_SIZE)
        x_padded = torch.cat((pad, x))
        y_padded = torch.cat((pad, y))

        # construct tensor to mask loss
        mask = torch.cat((torch.zeros(SEQ_LEN-len(x)), torch.ones(len(x))))

        return x_padded, y_padded, mask

从本质上讲，在__ __ getItem __中，我们不仅在输入x和target <代码> y 具有零值，我们还构建了一个简单的掩码，该掩码包含当前处理元素中填充值的位置。

请注意：

x_padded，形状（seq_len，emb_size）
y__padded，形状（seq_len，n_classes）
mask，形状（seq_len，）

都是在数据集中形状不变的所有三个张量，但mask包含我们计算损失所需的填充信息功能适当。

推断

您使用是正确的，因为它是用于二进制分类的多维损失。换句话说，您可以在此多标签分类任务中使用它，将128个logitt中的每一个都视为单个二进制预测。请勿使用 nn.crossentropyloss）如其他地方所建议的，因为 softmax 将推动单个logit（ ie 类），这是单标签分类所需的行为任务。

因此，在训练循环中，我们只需要将面具应用于损失。

for x, y, mask in dl:
   y_pred = model(x)
   loss = mask*bce(y_pred, y)
   # backpropagation, loss postprocessing, logs, etc.

It is usually recommended to stick with batch-wise operations and avoid going into single-element processing steps while in the main training loop. One way to handle this case is to make your dataset return padded inputs and labels with additionally a mask that will come useful for loss computation. In other words, to compute the loss term with sequences of varying sizes, we will use a mask instead of doing individual slices.

Dataset

The way to proceed is to make sure you build the mask in the dataset and not in the inference loop. Here I am showing a minimal implementation that you should be able to transfer to your dataset without much hassle:

class Dataset(data.Dataset):
    def __init__(self):
        super().__init__()

    def __len__(self):
        return 100

    def __getitem__(self, index):
        i = random.randint(5, SEQ_LEN) # for demo puporse, generate x with random length
        x = torch.rand(i, EMB_SIZE)
        y = torch.randint(0, N_CLASSES, (i, EMB_SIZE))

        # pad data to fit in batch
        pad = torch.zeros(SEQ_LEN-len(x), EMB_SIZE)
        x_padded = torch.cat((pad, x))
        y_padded = torch.cat((pad, y))

        # construct tensor to mask loss
        mask = torch.cat((torch.zeros(SEQ_LEN-len(x)), torch.ones(len(x))))

        return x_padded, y_padded, mask

Essentially in the __getitem__, we not only pad the input x and target y with zero values, we also construct a simple mask containing the positions of the padded values in the currently processed element.

Notice how:

x_padded, shaped (SEQ_LEN, EMB_SIZE)
y_padded, shaped (SEQ_LEN, N_CLASSES)
mask, shaped (SEQ_LEN,)

are all three tensors which are shape invariant across the dataset, yet mask contains the padding information necessary for us to compute the loss function appropriately.

Inference

The loss you've used nn.BCEWithLogitsLoss, is the correct one since it's a multi-dimensional loss used for binary classification. In other words, you can use it here in this multi-label classification task, considering each one of the 128 logits as an individual binary prediction. Do not use nn.CrossEntropyLoss) as suggested elsewhere, since the softmax will push a single logit (i.e. class), which is the behaviour required for single-label classification tasks.

Therefore, in the training loop, we simply have to apply the mask to our loss.

for x, y, mask in dl:
   y_pred = model(x)
   loss = mask*bce(y_pred, y)
   # backpropagation, loss postprocessing, logs, etc.

回复收藏 0 原文

宁愿没拥抱 2025-02-14 11:14:27

这就是问题的第一部分所需的，在TensorFlow中已经实现了损失功能： https://medium.com/@aadityaura_26777/the-loss-function-function-for-multi-multi-multi-multi-multi-multi-multi-and-multi-class--multi-class-fclass-class-f955cae52525252525 。您的只是tf.nn.weighted_cross_entropy_with_logits，但是您需要设置权重。

问题的第二部分并不简单，因为在IOU上有条件，通常，当您进行机器学习时，您应该在很大程度上依赖矩阵乘法，在您的情况下，您可能需要预先计算IOU-＆gt; 1或0作为向量，然后用y_pred（元素）繁殖，这将为您提供修改的y_pred。之后，您可以使用任何准确性可用功能来计算最终结果。

回复收藏 0 原文

我很OK 2025-02-14 11:14:27

如果您可以使用“ nofollow noreferrer”> crossentropyloss 而不是bcewithlogitsloss有一种叫做ignore_index的东西。您可以使用它来排除带衬垫的序列。两种损失之间的区别是使用的激活函数（softmax vs sigmoid）。但是我认为您仍然可以将Crossentropyloss用于二进制分类。