当前位置：文江博客话题详情

Python huggingface-transformers

叛军：端到端语言的关系提取

发布于 2025-02-13 16:20:41 字数 2607 浏览 1 评论 0原文

我正在尝试运行以下代码：

from transformers import pipeline

triplet_extractor = pipeline('text2text-generation', model='Babelscape/rebel-large', tokenizer='Babelscape/rebel-large')
# We need to use the tokenizer manually since we need special tokens.
extracted_text = triplet_extractor.tokenizer.batch_decode([triplet_extractor("Punta Cana is a resort town in the municipality of Higuey, in La Altagracia Province, the eastern most province of the Dominican Republic", return_tensors=True, return_text=False)[0]["generated_token_ids"]])
print(extracted_text[0])
# Function to parse the generated text and extract the triplets
def extract_triplets(text):
    triplets = []
    relation, subject, relation, object_ = '', '', '', ''
    text = text.strip()
    current = 'x'
    for token in text.replace("<s>", "").replace("<pad>", "").replace("</s>", "").split():
        if token == "<triplet>":
            current = 't'
            if relation != '':
                triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
                relation = ''
            subject = ''
        elif token == "<subj>":
            current = 's'
            if relation != '':
                triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
            object_ = ''
        elif token == "<obj>":
            current = 'o'
            relation = ''
        else:
            if current == 't':
                subject += ' ' + token
            elif current == 's':
                object_ += ' ' + token
            elif current == 'o':
                relation += ' ' + token
    if subject != '' and relation != '' and object_ != '':
        triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
    return triplets
extracted_triplets = extract_triplets(extracted_text[0])
print(extracted_triplets)

不幸的是，我会收到以下错误。

typeError：无法转换{'output_ids'：[[[0，50267，221，20339，2615，102，1437，50266，1587，7330，1073，13249，13249，493，493，493，16517 11、5、6833、15752、10014、1437、50266， 18978, 3497, 1437, 50265, 247, 1437, 50267, 19664, 1780, 219, 1437, 50266, 1587, 7330, 1073, 13249, 493, 16517, 1437, 50265, 2034, 11, 5, 6833, 15752, 10014，1437，50266， 18978, 3497, 1437, 50265, 247, 1437, 50267, 1587, 7330, 1073, 13249, 493, 16517, 1437, 50266, 18978, 3497, 1437, 50265, 247, 1437, 50267, 18978, 3497, 1437, 50266， 1587，7330，1073，13249，493，16517，1437，50265，6308，6833，6833，15752，10014，2]}

任何人都可以为此错误提供解决方案吗？

I am trying to run the following pieces of code:

from transformers import pipeline

triplet_extractor = pipeline('text2text-generation', model='Babelscape/rebel-large', tokenizer='Babelscape/rebel-large')
# We need to use the tokenizer manually since we need special tokens.
extracted_text = triplet_extractor.tokenizer.batch_decode([triplet_extractor("Punta Cana is a resort town in the municipality of Higuey, in La Altagracia Province, the eastern most province of the Dominican Republic", return_tensors=True, return_text=False)[0]["generated_token_ids"]])
print(extracted_text[0])
# Function to parse the generated text and extract the triplets
def extract_triplets(text):
    triplets = []
    relation, subject, relation, object_ = '', '', '', ''
    text = text.strip()
    current = 'x'
    for token in text.replace("<s>", "").replace("<pad>", "").replace("</s>", "").split():
        if token == "<triplet>":
            current = 't'
            if relation != '':
                triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
                relation = ''
            subject = ''
        elif token == "<subj>":
            current = 's'
            if relation != '':
                triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
            object_ = ''
        elif token == "<obj>":
            current = 'o'
            relation = ''
        else:
            if current == 't':
                subject += ' ' + token
            elif current == 's':
                object_ += ' ' + token
            elif current == 'o':
                relation += ' ' + token
    if subject != '' and relation != '' and object_ != '':
        triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
    return triplets
extracted_triplets = extract_triplets(extracted_text[0])
print(extracted_triplets)

Unfortunately I get the following error.

TypeError: Can't convert {'output_ids': [[0, 50267, 221, 20339, 2615, 102, 1437, 50266, 1587, 7330, 1073, 13249, 493, 16517, 1437, 50265, 2034, 11, 5, 6833, 15752, 10014, 1437, 50266, 18978, 3497, 1437, 50265, 247, 1437, 50267, 19664, 1780, 219, 1437, 50266, 1587, 7330, 1073, 13249, 493, 16517, 1437, 50265, 2034, 11, 5, 6833, 15752, 10014, 1437, 50266, 18978, 3497, 1437, 50265, 247, 1437, 50267, 1587, 7330, 1073, 13249, 493, 16517, 1437, 50266, 18978, 3497, 1437, 50265, 247, 1437, 50267, 18978, 3497, 1437, 50266, 1587, 7330, 1073, 13249, 493, 16517, 1437, 50265, 6308, 6833, 15752, 10014, 2]]} to Sequence

Can anyone provide a solution for this error?

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（1）

绮筵 2025-02-20 16:20:41

我可以成功运行代码，而无需错误的PF屏幕截图。也许您可以检查变形金刚版本吗？

我有一个使用Transformers Model/Tokenizers功能的COLAB笔记本电脑，也许您可以直接使用它们。
https://colab.research.google.com/drive/12gwdua-fs7h31hic5frzavt6ofyelwux?usp = sharing

我希望这会有所帮助！

回复收藏 0 原文

~没有更多了~

关于作者

丶情人眼里出诗心の

暂无简介

文章

评论

29 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

alipaysp_snBf0MSZIv

文章 0 评论 0

梦断已成空

文章 0 评论 0

瞎闹

文章 0 评论 0

凯凯我们等你回来

文章 0 评论 0

寄意

文章 0 评论 0

似梦非梦

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文