迭代apacy令牌并提取bilou标签
我应该如何用bilou标签注释以下句子?
我有一个称为get_dataset2
此功能的函数此功能将提供令牌,pos标签和bilou标签,但事物是粘在bilou标签上。
功能:
def get_dataset2(sent):
head_entity = ""
candidate_entity = ""
prv_tok_dep = ""
prv_tok_text = ""
prefix = ""
words_ = []
label_ = []
tags_ = []
doc = nlp(sent)
for tok in doc:
words_.append(tok.text)
label_.append(tok.pos_)
if(tok.text=='JUDGMENT'):
tags_.append('O')
next_token1 = doc[tok.i+1]
#next_tok_loc1 = tok.i+1
next_token2 = doc[tok.i+2]
#next_tok_loc2 = tok.i+2
if(tok.text==next_token1 and (next_token2.pos_=='PUNCT' or next_token2.pos_=='NUM')):
tags_.append('U-Parties')
#if(next_token1.pos_=='PROPN' and next_token2.pos_=='PROPN'):
#tags_.append('U-Parties')
else:
tags_.append('O')
return (pd.DataFrame({'Token': words_, 'POS': label_,'Tags': tags_}))
问题:get_dataset2('判断gajendragadkar,J。1。')
当我将此句子传递给该功能时,它将成功提取令牌和pos,而不是bilou标签。
应该是:
Tokens POS BILOU Tags
JUDGMENT PROPN O
Gajendragadkar PROPN U-Parties
, PUNCT O
我想像判断之后一样迭代令牌,我想确定第二个和第三代币,然后如果它是单个the u-parties,我将分配bilou标签。
谢谢!
How should I annotate the following sentence with BILOU tags?
I have a function called get_dataset2
what this function do is it will give the tokens, POS tags and BILOU tags but the things is that am stuck at BILOU tags.
Function:
def get_dataset2(sent):
head_entity = ""
candidate_entity = ""
prv_tok_dep = ""
prv_tok_text = ""
prefix = ""
words_ = []
label_ = []
tags_ = []
doc = nlp(sent)
for tok in doc:
words_.append(tok.text)
label_.append(tok.pos_)
if(tok.text=='JUDGMENT'):
tags_.append('O')
next_token1 = doc[tok.i+1]
#next_tok_loc1 = tok.i+1
next_token2 = doc[tok.i+2]
#next_tok_loc2 = tok.i+2
if(tok.text==next_token1 and (next_token2.pos_=='PUNCT' or next_token2.pos_=='NUM')):
tags_.append('U-Parties')
#if(next_token1.pos_=='PROPN' and next_token2.pos_=='PROPN'):
#tags_.append('U-Parties')
else:
tags_.append('O')
return (pd.DataFrame({'Token': words_, 'POS': label_,'Tags': tags_}))
Problem: get_dataset2('JUDGMENT Gajendragadkar, J. 1.')
when i pass this sentence to that function then it will successfully extract the tokens and POS but not the BILOU tags.
It should be like :
Tokens POS BILOU Tags
JUDGMENT PROPN O
Gajendragadkar PROPN U-Parties
, PUNCT O
I wan to iterate over tokens like after JUDGMENT I want to identify the second and third token and then I will assign the BILOU tags if it is single then U-parties.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论