如何使顶点AI多标签分类automl不忽略没有标签的文本？

发布于 2025-02-08 21:27:38 字数 1447 浏览 3 评论 0原文

我准备了一个按照文档。

我的上传文件看起来

{
  "textContent": "This text corresponds to 2 labels",
  "classificationAnnotations": [
    {"displayName": "LABEL_1"},
    {"displayName": "LABEL_2"}
  ]
}
{
  "textContent": "This text doesn't correspond to any labels",
  "classificationAnnotations": []
}
// ... and other 5,853 lines

只有1,037个文本具有非空的标签列表。

其他文本被视为“未标记”。 Automl忽略了未标记的文本。

作为解决方法，我在每个文本中添加了一个额外的标签，

{
  "textContent": "This text corresponds to 2 labels",
  "classificationAnnotations": [
    {"displayName": "LABEL_1"},
    {"displayName": "LABEL_2"},
    {"displayName": "EXTRA_LABEL"}
  ]
}
{
  "textContent": "This text doesn't correspond to any labels",
  "classificationAnnotations": [
    {"displayName": "EXTRA_LABEL"}
  ]
}
// ... and other 5,853 texts

是否有一种方法可以使自动使用“未标记”文本作为带有0个标签的文本？

原文

I prepared a training dataset for multi-label classification in JSON Lines format as described in docs.

My upload file looks like

{
  "textContent": "This text corresponds to 2 labels",
  "classificationAnnotations": [
    {"displayName": "LABEL_1"},
    {"displayName": "LABEL_2"}
  ]
}
{
  "textContent": "This text doesn't correspond to any labels",
  "classificationAnnotations": []
}
// ... and other 5,853 lines

Only 1,037 texts have non-empty list of labels.

Other texts are considered "Unlabeled". AutoML ignores unlabeled texts.

As a workaround I added an extra label to every text

{
  "textContent": "This text corresponds to 2 labels",
  "classificationAnnotations": [
    {"displayName": "LABEL_1"},
    {"displayName": "LABEL_2"},
    {"displayName": "EXTRA_LABEL"}
  ]
}
{
  "textContent": "This text doesn't correspond to any labels",
  "classificationAnnotations": [
    {"displayName": "EXTRA_LABEL"}
  ]
}
// ... and other 5,853 texts

Is there a way to make AutoML use "Unlabeled" texts as texts with 0 labels?

分享到QQ

分享到微博