迭代apacy令牌并提取bilou标签

发布于 2025-01-21 11:13:41 字数 1469 浏览 0 评论 0原文

我应该如何用bilou标签注释以下句子？

我有一个称为get_dataset2此功能的函数此功能将提供令牌，pos标签和bilou标签，但事物是粘在bilou标签上。

功能：

def get_dataset2(sent):
  head_entity = ""
  candidate_entity = ""

  prv_tok_dep = ""    
  prv_tok_text = ""  

  prefix = ""
  words_ = []
  label_ = []
  tags_ = []

  doc = nlp(sent) 
  
  for tok in doc:
      words_.append(tok.text)
      label_.append(tok.pos_)

      if(tok.text=='JUDGMENT'):
          tags_.append('O')
          next_token1 = doc[tok.i+1]
          #next_tok_loc1 = tok.i+1
          next_token2 = doc[tok.i+2]
          #next_tok_loc2 = tok.i+2

      if(tok.text==next_token1 and (next_token2.pos_=='PUNCT' or next_token2.pos_=='NUM')):
          tags_.append('U-Parties')


      #if(next_token1.pos_=='PROPN' and next_token2.pos_=='PROPN'):
          #tags_.append('U-Parties')

      else:
          tags_.append('O')    

  return (pd.DataFrame({'Token': words_, 'POS': label_,'Tags': tags_}))

问题：get_dataset2（'判断gajendragadkar，J。1。'）当我将此句子传递给该功能时，它将成功提取令牌和pos，而不是bilou标签。

应该是：

Tokens         POS       BILOU Tags
JUDGMENT       PROPN     O
Gajendragadkar PROPN     U-Parties
,              PUNCT     O

我想像判断之后一样迭代令牌，我想确定第二个和第三代币，然后如果它是单个the u-parties，我将分配bilou标签。

谢谢！

原文

How should I annotate the following sentence with BILOU tags?

I have a function called get_dataset2 what this function do is it will give the tokens, POS tags and BILOU tags but the things is that am stuck at BILOU tags.

Function:

def get_dataset2(sent):
  head_entity = ""
  candidate_entity = ""

  prv_tok_dep = ""    
  prv_tok_text = ""  

  prefix = ""
  words_ = []
  label_ = []
  tags_ = []

  doc = nlp(sent) 
  
  for tok in doc:
      words_.append(tok.text)
      label_.append(tok.pos_)

      if(tok.text=='JUDGMENT'):
          tags_.append('O')
          next_token1 = doc[tok.i+1]
          #next_tok_loc1 = tok.i+1
          next_token2 = doc[tok.i+2]
          #next_tok_loc2 = tok.i+2

      if(tok.text==next_token1 and (next_token2.pos_=='PUNCT' or next_token2.pos_=='NUM')):
          tags_.append('U-Parties')


      #if(next_token1.pos_=='PROPN' and next_token2.pos_=='PROPN'):
          #tags_.append('U-Parties')

      else:
          tags_.append('O')    

  return (pd.DataFrame({'Token': words_, 'POS': label_,'Tags': tags_}))

Problem: get_dataset2('JUDGMENT Gajendragadkar, J. 1.') when i pass this sentence to that function then it will successfully extract the tokens and POS but not the BILOU tags.

It should be like :

Tokens         POS       BILOU Tags
JUDGMENT       PROPN     O
Gajendragadkar PROPN     U-Parties
,              PUNCT     O

I wan to iterate over tokens like after JUDGMENT I want to identify the second and third token and then I will assign the BILOU tags if it is single then U-parties.

Thanks!

分享到QQ

分享到微博