在数据帧列上应用预先训练的 facebook/bart-large-cnn 在 python 中进行文本摘要
我正在与 Huggingface Transformers(Summarizers)合作,并对它有了一些见解。我正在使用 facebook/bart-large-cnn 模型来执行文本摘要,并且正在运行以下代码:
from transformers import pipeline
summarizer = pipeline("summarization")
text= "Good Morning team, I need a help in terms of one of the functions that needs to be written on the servers.. please let me know wen are you available.. Thanks , hgjhghjgjh, 193-6757-568"
print(summarizer(str(text), min_length = int(0.1 * len(str(text))), max_length = int(0.2 * len(str(text))),do_sample=False))
但我的问题是如何在我的数据框列之上应用相同的预训练模型。我的数据框如下所示:
ID Text
1 some long text here...
2 some long text here...
3 some long text here...
.... and so on for 100K rows
现在我想将预训练的模型应用于 col Text 以从中生成一个新列 df['summary_Text'] ,生成的数据框应如下所示:
ID Text Summary_Text
1 some long text here... Text summary goes here...
2 some long text here... Text summary goes here...
3 some long text here... Text summary goes here...
我怎样才能得到这个?任何快速帮助将不胜感激
I am working with huggingface transformers(Summarizers) and have got some insights into it. I am working with the facebook/bart-large-cnn model to perform text summarisation and I am running the below code:
from transformers import pipeline
summarizer = pipeline("summarization")
text= "Good Morning team, I need a help in terms of one of the functions that needs to be written on the servers.. please let me know wen are you available.. Thanks , hgjhghjgjh, 193-6757-568"
print(summarizer(str(text), min_length = int(0.1 * len(str(text))), max_length = int(0.2 * len(str(text))),do_sample=False))
But my question is that how can I apply the same pre trained model on top of my dataframe column. My dataframe looks like this:
ID Text
1 some long text here...
2 some long text here...
3 some long text here...
.... and so on for 100K rows
Now I want to apply the pre trained model to the col Text to generate a new column df['summary_Text'] from it and the resultant dataframe should look like:
ID Text Summary_Text
1 some long text here... Text summary goes here...
2 some long text here... Text summary goes here...
3 some long text here... Text summary goes here...
HOw can i get this ? ANy quick help would be highly appreciated
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我正在同一行工作,试图总结新闻文章。
您可以向模型输入字符串或列表。首先将数据框“文本”列转换为列表:
然后将其提供给模型:
这将返回一个列表并仅打印其第一个输出。您可以递归列表(res[1]['summary_text']..res[2]['summary_text'] 等....)并将其存储并将其作为数据框列添加回来。
如果您的文章很长,请使用 truncation=True 作为摘要生成器的输入参数(在其中输入 min_length 等)。
这将花费很长时间使用CPU。我自己正在寻找更快的替代方案。对我来说 XL_net 目前是一个可用的选项。希望这有帮助!
I am working on the same line trying to summarize news articles.
You can input either strings or lists to the model. First convert your dataframe 'Text' column to a list:
Then feed it to your model:
This gives back a list and prints only first output of it. You can recurse over the list (res[1]['summary_text']..res[2]['summary_text'] and so on....) and store it and add it back as a dataframe column.
Use truncation=True as input parameter (where you input min_length etc.) for the summarizer if your articles are long.
This will take a long time using cpu. I myself am looking for faster alternatives. For me XL_net is a usable option for now. Hope this helps!
这是我的代码,用于迭代 X 列中的 Excel 行并在另一列 Y 中获取摘要,希望这可以帮助您
this is my code to iterate through excel rows from column X and get summarization in another column Y, hope this can help you