迭代apacy令牌并提取bilou标签

发布于 2025-01-21 11:13:41 字数 1469 浏览 0 评论 0原文

我应该如何用bilou标签注释以下句子?

我有一个称为get_dataset2此功能的函数此功能将提供令牌,pos标签和bilou标签,但事物是粘在bilou标签上。

功能:

def get_dataset2(sent):
  head_entity = ""
  candidate_entity = ""

  prv_tok_dep = ""    
  prv_tok_text = ""  

  prefix = ""
  words_ = []
  label_ = []
  tags_ = []

  doc = nlp(sent) 
  
  for tok in doc:
      words_.append(tok.text)
      label_.append(tok.pos_)

      if(tok.text=='JUDGMENT'):
          tags_.append('O')
          next_token1 = doc[tok.i+1]
          #next_tok_loc1 = tok.i+1
          next_token2 = doc[tok.i+2]
          #next_tok_loc2 = tok.i+2

      if(tok.text==next_token1 and (next_token2.pos_=='PUNCT' or next_token2.pos_=='NUM')):
          tags_.append('U-Parties')


      #if(next_token1.pos_=='PROPN' and next_token2.pos_=='PROPN'):
          #tags_.append('U-Parties')

      else:
          tags_.append('O')    

  return (pd.DataFrame({'Token': words_, 'POS': label_,'Tags': tags_}))

问题:get_dataset2('判断gajendragadkar,J。1。')当我将此句子传递给该功能时,它将成功提取令牌和pos,而不是bilou标签。

应该是:

Tokens         POS       BILOU Tags
JUDGMENT       PROPN     O
Gajendragadkar PROPN     U-Parties
,              PUNCT     O

我想像判断之后一样迭代令牌,我想确定第二个和第三代币,然后如果它是单个the u-parties,我将分配bilou标签。

谢谢!

How should I annotate the following sentence with BILOU tags?

I have a function called get_dataset2 what this function do is it will give the tokens, POS tags and BILOU tags but the things is that am stuck at BILOU tags.

Function:

def get_dataset2(sent):
  head_entity = ""
  candidate_entity = ""

  prv_tok_dep = ""    
  prv_tok_text = ""  

  prefix = ""
  words_ = []
  label_ = []
  tags_ = []

  doc = nlp(sent) 
  
  for tok in doc:
      words_.append(tok.text)
      label_.append(tok.pos_)

      if(tok.text=='JUDGMENT'):
          tags_.append('O')
          next_token1 = doc[tok.i+1]
          #next_tok_loc1 = tok.i+1
          next_token2 = doc[tok.i+2]
          #next_tok_loc2 = tok.i+2

      if(tok.text==next_token1 and (next_token2.pos_=='PUNCT' or next_token2.pos_=='NUM')):
          tags_.append('U-Parties')


      #if(next_token1.pos_=='PROPN' and next_token2.pos_=='PROPN'):
          #tags_.append('U-Parties')

      else:
          tags_.append('O')    

  return (pd.DataFrame({'Token': words_, 'POS': label_,'Tags': tags_}))

Problem: get_dataset2('JUDGMENT Gajendragadkar, J. 1.') when i pass this sentence to that function then it will successfully extract the tokens and POS but not the BILOU tags.

It should be like :

Tokens         POS       BILOU Tags
JUDGMENT       PROPN     O
Gajendragadkar PROPN     U-Parties
,              PUNCT     O

I wan to iterate over tokens like after JUDGMENT I want to identify the second and third token and then I will assign the BILOU tags if it is single then U-parties.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文