拥抱面变压器填充vs pad_to_max_length

发布于 2025-02-09 18:35:44 字数 918 浏览 1 评论 0原文

我正在使用pad_to_to_max_length = true运行代码，一切正常。只有我警告以下警告：

未来沃宁：pad_to_max_length参数已弃用和将在以后的版本中删除，使用padding = true或 padding ='最长'以填充批处理中最长的序列，或使用padding ='max_length'将其粘贴到最大长度。在这种情况下，您可以给出特定的长度，以max_length（例如max_length = 45）或将max_length保留到毫无疑问的模型的最大输入大小（例如，伯特512）。

但是，当我更改pad_to_to_max_length = true true padding ='max_length'我得到此错误：

RuntimeError: stack expects each tensor to be equal size, but got [60] at entry 0 and [64] at entry 6

如何将代码更改为新版本？警告文档有什么问题吗？

这是我的编码器：

encoding = self.tokenizer.encode_plus(
    poem,
    add_special_tokens=True,
    max_length= 60,
    return_token_type_ids=False,
    pad_to_max_length = True,
    return_attention_mask=True,
    return_tensors='pt',
)

原文

I'm running a code by using pad_to_max_length = True and everything works fine. Only I get a warning as follow:

FutureWarning: The pad_to_max_length argument is deprecated and
will be removed in a future version, use padding=True or
padding='longest' to pad to the longest sequence in the batch, or
use padding='max_length' to pad to a max length. In this case, you
can give a specific length with max_length (e.g. max_length=45) or
leave max_length to None to pad to the maximal input size of the model
(e.g. 512 for Bert).

But when I change pad_to_max_length = True to padding='max_length' I get this error:

RuntimeError: stack expects each tensor to be equal size, but got [60] at entry 0 and [64] at entry 6

How can I change the code to the new version? Is there anything I got wrong with the warning documentation?

This is my encoder:

encoding = self.tokenizer.encode_plus(
    poem,
    add_special_tokens=True,
    max_length= 60,
    return_token_type_ids=False,
    pad_to_max_length = True,
    return_attention_mask=True,
    return_tensors='pt',
)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

む无字情书 2025-02-16 18:35:44

看来该文档还不够完整！

您应该添加truncation = true，以记录pad_to_to_max_length = true。

像这样：

encoding = self.tokenizer.encode_plus(
    poem,
    add_special_tokens=True,
    max_length=self.max_len,
    return_token_type_ids=False,
    padding='max_length',
    truncation=True,
    return_attention_mask=True,
    return_tensors='pt',
)

It seems that the documentation is not complete enough!

You should add truncation=True too to memic the pad_to_max_length = True.

like this:

encoding = self.tokenizer.encode_plus(
    poem,
    add_special_tokens=True,
    max_length=self.max_len,
    return_token_type_ids=False,
    padding='max_length',
    truncation=True,
    return_attention_mask=True,
    return_tensors='pt',
)

回复收藏 0 原文

~没有更多了~