在数据帧列上应用预先训练的 facebook/bart-large-cnn 在 python 中进行文本摘要

发布于 2025-01-13 00:31:05 字数 1180 浏览 0 评论 0原文

我正在与 Huggingface Transformers（Summarizers）合作，并对它有了一些见解。我正在使用 facebook/bart-large-cnn 模型来执行文本摘要，并且正在运行以下代码：

from transformers import pipeline
summarizer = pipeline("summarization") 
text= "Good Morning team, I need a help in terms of one of the functions that needs to be written on the servers.. please let me know wen are you available.. Thanks , hgjhghjgjh, 193-6757-568"
print(summarizer(str(text), min_length = int(0.1 * len(str(text))), max_length = int(0.2 * len(str(text))),do_sample=False))

但我的问题是如何在我的数据框列之上应用相同的预训练模型。我的数据框如下所示：

ID       Text
1          some long text here...
2          some long text here...
3          some long text here...
.... and so on for 100K rows

现在我想将预训练的模型应用于 col Text 以从中生成一个新列 df['summary_Text'] ，生成的数据框应如下所示：

ID          Text                              Summary_Text
1          some long text here...           Text summary goes here...
2          some long text here...           Text summary goes here...
3          some long text here...           Text summary goes here...

我怎样才能得到这个？任何快速帮助将不胜感激

原文

I am working with huggingface transformers(Summarizers) and have got some insights into it. I am working with the facebook/bart-large-cnn model to perform text summarisation and I am running the below code:

from transformers import pipeline
summarizer = pipeline("summarization") 
text= "Good Morning team, I need a help in terms of one of the functions that needs to be written on the servers.. please let me know wen are you available.. Thanks , hgjhghjgjh, 193-6757-568"
print(summarizer(str(text), min_length = int(0.1 * len(str(text))), max_length = int(0.2 * len(str(text))),do_sample=False))

But my question is that how can I apply the same pre trained model on top of my dataframe column. My dataframe looks like this:

ID       Text
1          some long text here...
2          some long text here...
3          some long text here...
.... and so on for 100K rows

Now I want to apply the pre trained model to the col Text to generate a new column df['summary_Text'] from it and the resultant dataframe should look like:

ID          Text                              Summary_Text
1          some long text here...           Text summary goes here...
2          some long text here...           Text summary goes here...
3          some long text here...           Text summary goes here...

HOw can i get this ? ANy quick help would be highly appreciated

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

划一舟意中人 2025-01-20 00:31:05

我正在同一行工作，试图总结新闻文章。
您可以向模型输入字符串或列表。首先将数据框“文本”列转换为列表：

input_col = df['Text'].to_list()

然后将其提供给模型：

from transformers import pipeline
summarizer = pipeline("summarization") 

res = summarizer(input_col, min_length = int(0.1 * len(str(text))), max_length = int(0.2 * len(str(text))),do_sample=False)
print(res[0]['summary_text])

这将返回一个列表并仅打印其第一个输出。您可以递归列表（res[1]['summary_text']..res[2]['summary_text'] 等....）并将其存储并将其作为数据框列添加回来。

df_res = []
for i in range(len(res)):
   df_res.append(res[i]['summary_text'])

df['Summary_Text'] = df_res

如果您的文章很长，请使用 truncation=True 作为摘要生成器的输入参数（在其中输入 min_length 等）。

这将花费很长时间使用CPU。我自己正在寻找更快的替代方案。对我来说 XL_net 目前是一个可用的选项。希望这有帮助！

I am working on the same line trying to summarize news articles.
You can input either strings or lists to the model. First convert your dataframe 'Text' column to a list:

input_col = df['Text'].to_list()

Then feed it to your model:

from transformers import pipeline
summarizer = pipeline("summarization") 

res = summarizer(input_col, min_length = int(0.1 * len(str(text))), max_length = int(0.2 * len(str(text))),do_sample=False)
print(res[0]['summary_text])

This gives back a list and prints only first output of it. You can recurse over the list (res[1]['summary_text']..res[2]['summary_text'] and so on....) and store it and add it back as a dataframe column.

df_res = []
for i in range(len(res)):
   df_res.append(res[i]['summary_text'])

df['Summary_Text'] = df_res

Use truncation=True as input parameter (where you input min_length etc.) for the summarizer if your articles are long.

This will take a long time using cpu. I myself am looking for faster alternatives. For me XL_net is a usable option for now. Hope this helps!

回复收藏 0 原文

紫竹語嫣☆ 2025-01-20 00:31:05

这是我的代码，用于迭代 X 列中的 Excel 行并在另一列 Y 中获取摘要，希望这可以帮助您

from transformers import pipeline
import openpyxl

wb = openpyxl.load_workbook(wb, read_only=False)    
ws = wb["sheet"]   
bart_summarizer = pipeline("summarization")    
for row in ws.iter_rows(min_col=8, min_row=2, max_col=8, max_row= 5):    
    for cell in row:    
        TEXT_TO_SUMMARIZE = cell.value    
        summary = bart_summarizer(TEXT_TO_SUMMARIZE, min_length=10, max_length=100)    
        r = cell.row   
        ws.cell(row=r, column=10).value = str(summary)   
        wb.save(wb)

this is my code to iterate through excel rows from column X and get summarization in another column Y, hope this can help you

from transformers import pipeline
import openpyxl

wb = openpyxl.load_workbook(wb, read_only=False)    
ws = wb["sheet"]   
bart_summarizer = pipeline("summarization")    
for row in ws.iter_rows(min_col=8, min_row=2, max_col=8, max_row= 5):    
    for cell in row:    
        TEXT_TO_SUMMARIZE = cell.value    
        summary = bart_summarizer(TEXT_TO_SUMMARIZE, min_length=10, max_length=100)    
        r = cell.row   
        ws.cell(row=r, column=10).value = str(summary)   
        wb.save(wb)

回复收藏 0 原文

~没有更多了~