按顺序标记不同长度的单词
嗨,我正在尝试按顺序标记句子中的单词。 例如,(我最初的方法)
Sentence: Work across a wide range of related areas
Label: Tag O O O O O Tag Tag
但现在我需要它像这样,它可以将 2 个单词标记为关键字 a 并将其标记在一起:
Sentence: Work across a wide range of related areas
Label: Tag O O O O O Tag
我有一个不同长度的关键字及其标签的列表。我怎样才能按照我需要的方式标记句子顺序?
Hi i am trying to tag the words in a sentence in order.
For example, (my initial method)
Sentence: Work across a wide range of related areas
Label: Tag O O O O O Tag Tag
But now i need it to be like this where it can tag 2 words as a keyword aand label it together:
Sentence: Work across a wide range of related areas
Label: Tag O O O O O Tag
I have a list of keyword of varying length and their tags. How can i tag the way i need it to be in the sentence order?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
看起来您正在寻找的是生物标记系统(如果我理解正确的话,并且您正在寻找手动标记语料库中的解决方案)。
BIO 表示以下内容:B - 块的开始,I - 块的内部,O - 块外部的标记。
第 1 步
第 2 步
标记语料库后,您将对齐句子列表(列表 #1)和标签 + 标签组合(列表 #2):
BIO 标签将作为您的标签的前缀,例如 [...相关、区域] + [... B-Label_2、I-Label_2]。
这样您就可以将 [B-Label_2, I-Label_2] 组合成一个 Label_2,因为您已经拥有了 BI 模式。您只需在最后去掉前缀并执行许多其他中间步骤和后处理。
Looks like what you are looking for is the BIO-tagging system (If I understood you correctly, and you are looking for a solution in manually tagged corpora).
BIO denotes the following: B - beginning of a chunk, I - the inside of the chunk, O - is a token outside of a chunk.
Step 1
Step 2
Once you have tagged your corpus, you will align the lists of Sentences (list #1) and Tag + Label combos (list #2):
the BIO tags will be prefixed to your labels, e.g., [...related, areas] + [... B-Label_2, I-Label_2].
That way you can combine [B-Label_2, I-Label_2] into one Label_2 since you have a pattern of BI together. You will just have to strip the prefixes at the very end and do a lot of other intermediate steps and post-processing.