翻译模型的慢速预测速度Opus-Mt-en-Ro

发布于 2025-01-20 18:52:54 字数 846 浏览 2 评论 0原文

我正在使用模型 Helsinki-NLP/opus-mt-en-ro< /a> 来自拥抱的脸。为了生成输出，我使用以下代码：

    inputs = tokenizer(
            questions,
            max_length=max_input_length,
            truncation=True,
            return_tensors='pt',
            padding=True).to('cuda')
    translation = model.generate(**inputs)

对于小输入（即问题中的句子数），它工作得很好。然而，当句子数量增加时（例如，batch size = 128），速度非常慢。我有一个包含 10 万个示例的数据集，我必须生成输出。如何让它更快？（我已经检查过 GPU 的使用情况，它在 25% 到 70% 之间变化）。

更新：根据 dennlinger 的评论，以下是附加信息：

平均问题长度：大约 30 个标记
缓慢的定义：一批 128 个问题，大约需要 25 秒。因此，考虑到我的数据集有 10 万个示例，这将需要 5 个多小时。我使用的是 GPU Nvidia V100 (16GB)（因此代码中的 to('cuda')）。我无法增加批处理大小，因为它会导致内存不足错误。
我没有尝试不同的参数，但我知道默认情况下，光束数量等于 1。

原文

I'm using the model Helsinki-NLP/opus-mt-en-ro from huggingface.
To produce output, I'm using the following code:

    inputs = tokenizer(
            questions,
            max_length=max_input_length,
            truncation=True,
            return_tensors='pt',
            padding=True).to('cuda')
    translation = model.generate(**inputs)

For small inputs (i.e. the number of sentences in questions), it works fine. However, when the number of sentences increases (e.g., batch size = 128), it is very slow.
I have a dataset of 100K examples and I have to produce the output. How to make it faster? (I already checked the usage of GPU and it varies between 25% and 70%).

Update: Following the comment of dennlinger, here is the additional information:

Average question length: Around 30 tokens
Definition of slowness: With a batch of 128 questions, it takes around 25 seconds. So given my dataset of 100K examples, it will take more than 5 hours. I'm using GPU Nvidia V100 (16GB) (hence to('cuda') in the code). I cannot increase the batch size because it results in out of memory error.
I didn't try different parameters, but I know by default, the number of beams equals 1.

分享到QQ

分享到微博