使用Pytorch数据集用于模型推理-GPU

发布于 2025-01-28 06:40:40 字数 1972 浏览 2 评论 0原文

我正在运行T5基准 - 格拉玛校正,以通过文本列

from happytransformer import HappyTextToText
from happytransformer import TTSettings
from tqdm.notebook import tqdm
tqdm.pandas()

happy_tt = HappyTextToText("T5",  "./t5-base-grammar-correction")
beam_settings =  TTSettings(num_beams=5, min_length=1, max_length=30)
def grammer_pipeline(text):
    text = "gec: " + text
    result = happy_tt.generate_text(text, args=beam_settings)
    
    return result.text

df['new_text'] =  df['original_text'].progress_apply(grammer_pipeline)

Pandas应用功能在我的数据范围内进行语法校正,尽管运行并提供了所需的结果,但是运行 slow slow

代码时会收到以下警告

/home/.local/lib/python3.6/site-packages/transformers/pipelines/base.py:908: UserWarning: You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
  UserWarning,

另外,我在执行我可以访问GPU的 。有人可以提供一些指针来加快执行和使用GPU的完整功能

-------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------》

以下方式使用pytorch数据集,但处理仍然很慢:

class CustomD(Dataset):
    
    def __init__(self, text):
        self.text = text
        self.len = text.shape[0]
        
    def __len__(self):
        return self.len
    
    def __getitem__(self, idx):
        text = self.text[idx]
        text = "gec: " + text
        result = happy_tt.generate_text(text, args=beam_settings)
        return result.text 

TD = GramData(df.original_text)
final_data = DataLoader(dataset=TD,
                          batch_size=10,
                          shuffle=False
                         )
import itertools
list_modified=[]
for (idx, batch) in enumerate(final_data):
    list_modified.append(batch)



flat_list = [item for sublist in list_modified for item in sublist]
df["new_text"]=flat_list

I am running T5-base-grammar-correction for grammer correction on my dataframe with text column

from happytransformer import HappyTextToText
from happytransformer import TTSettings
from tqdm.notebook import tqdm
tqdm.pandas()

happy_tt = HappyTextToText("T5",  "./t5-base-grammar-correction")
beam_settings =  TTSettings(num_beams=5, min_length=1, max_length=30)
def grammer_pipeline(text):
    text = "gec: " + text
    result = happy_tt.generate_text(text, args=beam_settings)
    
    return result.text

df['new_text'] =  df['original_text'].progress_apply(grammer_pipeline)

Pandas apply function, though runs and provides required results, but runs quite slow.

Also I get the below warning while executing the code

/home/.local/lib/python3.6/site-packages/transformers/pipelines/base.py:908: UserWarning: You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
  UserWarning,

I have access to GPU. Can somebody provide some pointers to speed up the execution and utilising full capabilities of GPU

--------------------------------EDIT---------------------------------

I tried using pytorch Dataset in the below way, but still the processing is slow:

class CustomD(Dataset):
    
    def __init__(self, text):
        self.text = text
        self.len = text.shape[0]
        
    def __len__(self):
        return self.len
    
    def __getitem__(self, idx):
        text = self.text[idx]
        text = "gec: " + text
        result = happy_tt.generate_text(text, args=beam_settings)
        return result.text 

TD = GramData(df.original_text)
final_data = DataLoader(dataset=TD,
                          batch_size=10,
                          shuffle=False
                         )
import itertools
list_modified=[]
for (idx, batch) in enumerate(final_data):
    list_modified.append(batch)



flat_list = [item for sublist in list_modified for item in sublist]
df["new_text"]=flat_list

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文