如何禁用 POS 标记的 seqeval 标签格式

发布于 2025-01-11 08:21:54 字数 2688 浏览 0 评论 0原文

我正在尝试使用 Huggingface 的 seqeval 指标实现来评估我的 POS 标记器，但是，由于我的标记不是为 NER 制作的，因此它们的格式不符合库期望的方式。因此，当我尝试读取分类报告的结果时，特定于类的结果的标签始终缺少第一个字符（如果我传递 suffix=True，则为最后一个字符）。

有没有办法禁用标签中的实体识别，或者我是否必须通过带有起始空格的所有标签来解决此问题？（鉴于该库应该适合词性标记，我希望有一个内置的解决方案）

SSCCE：

from seqeval.metrics import accuracy_score
from seqeval.metrics import classification_report
from seqeval.metrics import f1_score

y_true = [['INT', 'PRO', 'PRO', 'VER:pres'], ['ADV', 'PRP', 'PRP', 'ADV']]
y_pred = [['INT', 'PRO', 'PRO', 'VER:pres'], ['ADV', 'PRP', 'PRP', 'ADV']]

print(classification_report(y_true, y_pred))

输出：

	精确	召回率	f1 分数	支持
DV	1.00	1.00	1.00	2
ER:pres	1.00	1.00	1.00	1
NT	1.00	1.00	1.00	1
RO	1.00	1.00	1.00	1
RP	1.00	1.00	1.00	1
micro avg	1.00	1.00	1.00	6
宏观平均值	1.00	1.00	1.00	6
加权平均值	1.00	1.00	1.00	6

原文

I am trying to evaluate my POS-tagger using huggingface's implementation of the seqeval metric but, since my tags are not made for NER, they are not formatted the way the library expects them. Consequently, when I try to read the results of my classification report, the labels for class-specific results consistently lack the first character (the last if I pass suffix=True).

Is there a way to disable entity recognition in labels or do I have to pass all my labels with a starting space to solve this issue? (Given that the library is supposed to be suitable for POS-tagging, I hope there is a built-in solution)

SSCCE:

from seqeval.metrics import accuracy_score
from seqeval.metrics import classification_report
from seqeval.metrics import f1_score

y_true = [['INT', 'PRO', 'PRO', 'VER:pres'], ['ADV', 'PRP', 'PRP', 'ADV']]
y_pred = [['INT', 'PRO', 'PRO', 'VER:pres'], ['ADV', 'PRP', 'PRP', 'ADV']]

print(classification_report(y_true, y_pred))