按顺序标记不同长度的单词

发布于 2025-01-17 14:15:45 字数 384 浏览 3 评论 0原文

嗨,我正在尝试按顺序标记句子中的单词。 例如,(我最初的方法)

Sentence: Work across a wide range of related areas
Label:    Tag    O    O O    O     O  Tag     Tag

但现在我需要它像这样,它可以将 2 个单词标记为关键字 a 并将其标记在一起:

Sentence: Work across a wide range of related areas
Label:    Tag    O    O O    O     O  Tag     

我有一个不同长度的关键字及其标签的列表。我怎样才能按照我需要的方式标记句子顺序?

Hi i am trying to tag the words in a sentence in order.
For example, (my initial method)

Sentence: Work across a wide range of related areas
Label:    Tag    O    O O    O     O  Tag     Tag

But now i need it to be like this where it can tag 2 words as a keyword aand label it together:

Sentence: Work across a wide range of related areas
Label:    Tag    O    O O    O     O  Tag     

I have a list of keyword of varying length and their tags. How can i tag the way i need it to be in the sentence order?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

霓裳挽歌倾城醉 2025-01-24 14:15:45

看起来您正在寻找的是生物标记系统(如果我理解正确的话,并且您正在寻找手动标记语料库中的解决方案)。

BIO 表示以下内容:B - 块的开始,I - 块的内部,O - 块外部的标记。

第 1 步

Sentence: Work across a wide range of related areas
Tag:       B     O    O   O    O    O   B        I
Label:  Label_1  O    O   O    O    O   Label_2  Label_2 

第 2 步

Sentence: Work across a wide range of related areas
Label:  B-Label_1  O    O   O    O    O   B-Label_2  I-Label_2 

标记语料库后,您将对齐句子列表(列表 #1)和标签 + 标签组合(列表 #2):
BIO 标签将作为您的标签的前缀,例如 [...相关、区域] + [... B-Label_2、I-Label_2]。
这样您就可以将 [B-Label_2, I-Label_2] 组合成一个 Label_2,因为您已经拥有了 BI 模式。您只需在最后去掉前缀并执行许多其他中间步骤和后处理。

Looks like what you are looking for is the BIO-tagging system (If I understood you correctly, and you are looking for a solution in manually tagged corpora).

BIO denotes the following: B - beginning of a chunk, I - the inside of the chunk, O - is a token outside of a chunk.

Step 1

Sentence: Work across a wide range of related areas
Tag:       B     O    O   O    O    O   B        I
Label:  Label_1  O    O   O    O    O   Label_2  Label_2 

Step 2

Sentence: Work across a wide range of related areas
Label:  B-Label_1  O    O   O    O    O   B-Label_2  I-Label_2 

Once you have tagged your corpus, you will align the lists of Sentences (list #1) and Tag + Label combos (list #2):
the BIO tags will be prefixed to your labels, e.g., [...related, areas] + [... B-Label_2, I-Label_2].
That way you can combine [B-Label_2, I-Label_2] into one Label_2 since you have a pattern of BI together. You will just have to strip the prefixes at the very end and do a lot of other intermediate steps and post-processing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文