叛军:端到端语言的关系提取
我正在尝试运行以下代码:
from transformers import pipeline
triplet_extractor = pipeline('text2text-generation', model='Babelscape/rebel-large', tokenizer='Babelscape/rebel-large')
# We need to use the tokenizer manually since we need special tokens.
extracted_text = triplet_extractor.tokenizer.batch_decode([triplet_extractor("Punta Cana is a resort town in the municipality of Higuey, in La Altagracia Province, the eastern most province of the Dominican Republic", return_tensors=True, return_text=False)[0]["generated_token_ids"]])
print(extracted_text[0])
# Function to parse the generated text and extract the triplets
def extract_triplets(text):
triplets = []
relation, subject, relation, object_ = '', '', '', ''
text = text.strip()
current = 'x'
for token in text.replace("<s>", "").replace("<pad>", "").replace("</s>", "").split():
if token == "<triplet>":
current = 't'
if relation != '':
triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
relation = ''
subject = ''
elif token == "<subj>":
current = 's'
if relation != '':
triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
object_ = ''
elif token == "<obj>":
current = 'o'
relation = ''
else:
if current == 't':
subject += ' ' + token
elif current == 's':
object_ += ' ' + token
elif current == 'o':
relation += ' ' + token
if subject != '' and relation != '' and object_ != '':
triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
return triplets
extracted_triplets = extract_triplets(extracted_text[0])
print(extracted_triplets)
不幸的是,我会收到以下错误。
typeError:无法转换{'output_ids':[[[0,50267,221,20339,2615,102,1437,50266,1587,7330,1073,13249,13249,493,493,493,16517 11、5、6833、15752、10014、1437、50266, 18978, 3497, 1437, 50265, 247, 1437, 50267, 19664, 1780, 219, 1437, 50266, 1587, 7330, 1073, 13249, 493, 16517, 1437, 50265, 2034, 11, 5, 6833, 15752, 10014,1437,50266, 18978, 3497, 1437, 50265, 247, 1437, 50267, 1587, 7330, 1073, 13249, 493, 16517, 1437, 50266, 18978, 3497, 1437, 50265, 247, 1437, 50267, 18978, 3497, 1437, 50266, 1587,7330,1073,13249,493,16517,1437,50265,6308,6833,6833,15752,10014,2]}
任何人都可以为此错误提供解决方案吗?
I am trying to run the following pieces of code:
from transformers import pipeline
triplet_extractor = pipeline('text2text-generation', model='Babelscape/rebel-large', tokenizer='Babelscape/rebel-large')
# We need to use the tokenizer manually since we need special tokens.
extracted_text = triplet_extractor.tokenizer.batch_decode([triplet_extractor("Punta Cana is a resort town in the municipality of Higuey, in La Altagracia Province, the eastern most province of the Dominican Republic", return_tensors=True, return_text=False)[0]["generated_token_ids"]])
print(extracted_text[0])
# Function to parse the generated text and extract the triplets
def extract_triplets(text):
triplets = []
relation, subject, relation, object_ = '', '', '', ''
text = text.strip()
current = 'x'
for token in text.replace("<s>", "").replace("<pad>", "").replace("</s>", "").split():
if token == "<triplet>":
current = 't'
if relation != '':
triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
relation = ''
subject = ''
elif token == "<subj>":
current = 's'
if relation != '':
triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
object_ = ''
elif token == "<obj>":
current = 'o'
relation = ''
else:
if current == 't':
subject += ' ' + token
elif current == 's':
object_ += ' ' + token
elif current == 'o':
relation += ' ' + token
if subject != '' and relation != '' and object_ != '':
triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
return triplets
extracted_triplets = extract_triplets(extracted_text[0])
print(extracted_triplets)
Unfortunately I get the following error.
TypeError: Can't convert {'output_ids': [[0, 50267, 221, 20339, 2615, 102, 1437, 50266, 1587, 7330, 1073, 13249, 493, 16517, 1437, 50265, 2034, 11, 5, 6833, 15752, 10014, 1437, 50266, 18978, 3497, 1437, 50265, 247, 1437, 50267, 19664, 1780, 219, 1437, 50266, 1587, 7330, 1073, 13249, 493, 16517, 1437, 50265, 2034, 11, 5, 6833, 15752, 10014, 1437, 50266, 18978, 3497, 1437, 50265, 247, 1437, 50267, 1587, 7330, 1073, 13249, 493, 16517, 1437, 50266, 18978, 3497, 1437, 50265, 247, 1437, 50267, 18978, 3497, 1437, 50266, 1587, 7330, 1073, 13249, 493, 16517, 1437, 50265, 6308, 6833, 15752, 10014, 2]]} to Sequence
Can anyone provide a solution for this error?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我可以成功运行代码,而无需错误的PF屏幕截图。也许您可以检查变形金刚版本吗?
我有一个使用Transformers Model/Tokenizers功能的COLAB笔记本电脑,也许您可以直接使用它们。
https://colab.research.google.com/drive/12gwdua-fs7h31hic5frzavt6ofyelwux?usp = sharing
我希望这会有所帮助!
I can run the code successfully without an error PF screenshot. Maybe can you check with your transformers version?
I have a colab notebook using transformers model/tokenizers functions directly maybe you can use them directly.
https://colab.research.google.com/drive/12gwdua-Fs7H31HIc5frzavT6ofyeLWux?usp=sharing
I hope this helps, Thanks!