如何使顶点AI多标签分类automl不忽略没有标签的文本?
我准备了一个按照文档。
我的上传文件看起来
{
"textContent": "This text corresponds to 2 labels",
"classificationAnnotations": [
{"displayName": "LABEL_1"},
{"displayName": "LABEL_2"}
]
}
{
"textContent": "This text doesn't correspond to any labels",
"classificationAnnotations": []
}
// ... and other 5,853 lines
只有1,037个文本具有非空的标签列表。
其他文本被视为“未标记”。 Automl忽略了未标记的文本。
作为解决方法,我在每个文本中添加了一个额外的标签,
{
"textContent": "This text corresponds to 2 labels",
"classificationAnnotations": [
{"displayName": "LABEL_1"},
{"displayName": "LABEL_2"},
{"displayName": "EXTRA_LABEL"}
]
}
{
"textContent": "This text doesn't correspond to any labels",
"classificationAnnotations": [
{"displayName": "EXTRA_LABEL"}
]
}
// ... and other 5,853 texts
是否有一种方法可以使自动使用“未标记”文本作为带有0个标签的文本?
I prepared a training dataset for multi-label classification in JSON Lines format as described in docs.
My upload file looks like
{
"textContent": "This text corresponds to 2 labels",
"classificationAnnotations": [
{"displayName": "LABEL_1"},
{"displayName": "LABEL_2"}
]
}
{
"textContent": "This text doesn't correspond to any labels",
"classificationAnnotations": []
}
// ... and other 5,853 lines
Only 1,037 texts have non-empty list of labels.
Other texts are considered "Unlabeled". AutoML ignores unlabeled texts.
As a workaround I added an extra label to every text
{
"textContent": "This text corresponds to 2 labels",
"classificationAnnotations": [
{"displayName": "LABEL_1"},
{"displayName": "LABEL_2"},
{"displayName": "EXTRA_LABEL"}
]
}
{
"textContent": "This text doesn't correspond to any labels",
"classificationAnnotations": [
{"displayName": "EXTRA_LABEL"}
]
}
// ... and other 5,853 texts
Is there a way to make AutoML use "Unlabeled" texts as texts with 0 labels?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我们经常将未标记的文本放入全零矢量进行培训。我认为,目前无法在Automl中完成这一点。
We often put the unlabeled text to an all-zero vector for training. This can't be done in Automl for now, I think.