create_padding_mask在变压器代码中使用编码器输入序列，以在解码器的第二个注意力块中创建填充掩码

发布于 2025-01-29 17:22:13 字数 948 浏览 1 评论 0原文

我将在tensorflow.org上浏览变压器代码 - https://www.tensorflow.org/文本/教程/变压器

def create_masks(self, inp, tar):
    # Encoder padding mask (Used in the 2nd attention block in the decoder too.)
    padding_mask = create_padding_mask(inp)

    # Used in the 1st attention block in the decoder.
    # It is used to pad and mask future tokens in the input received by
    # the decoder.
    look_ahead_mask = create_look_ahead_mask(tf.shape(tar)[1])
    dec_target_padding_mask = create_padding_mask(tar)
    look_ahead_mask = tf.maximum(dec_target_padding_mask, look_ahead_mask)

    return padding_mask, look_ahead_mask

变压器类具有一种称为create_masks的方法，该方法可创建填充并向前看掩码。我知道编码器的填充蒙版应采用输入序列（输入到编码器）以创建填充面膜。但是，我不明白的是，为什么应该使用编码器的输入序列来为解码器的第二个注意力块（代码的第一行）创建填充掩码。我认为应该使用目标序列（将其馈送到解码器）创建解码器的填充面膜。

请帮助我了解为什么这样做。

原文

I am going through the Transformer code on tensorflow.org - https://www.tensorflow.org/text/tutorials/transformer

def create_masks(self, inp, tar):
    # Encoder padding mask (Used in the 2nd attention block in the decoder too.)
    padding_mask = create_padding_mask(inp)

    # Used in the 1st attention block in the decoder.
    # It is used to pad and mask future tokens in the input received by
    # the decoder.
    look_ahead_mask = create_look_ahead_mask(tf.shape(tar)[1])
    dec_target_padding_mask = create_padding_mask(tar)
    look_ahead_mask = tf.maximum(dec_target_padding_mask, look_ahead_mask)

    return padding_mask, look_ahead_mask

Transformer class has a method called create_masks which creates padding and look ahead mask.
I understand that padding mask for encoder should take input sequence(input to the encoder) for creating padding mask. However, what I do not understand is why should the input sequence to the encoder should be used for creating padding mask for second attention block of the decoder(first line of the code). I think the padding mask for decoder should be created using the target sequence(which is fed to the decoder).

Please help me understand why this is done.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

〆凄凉。 2025-02-05 17:22:13

首先，您应该了解编码器中的填充面膜沿关注得分中的密钥轴应用（如果将注意力分数视为2D矩阵，那就是J或第二轴）。
然后来解码器，编码器输出提供了解码器注意层（解码器中的第二个注意力层）的关键，而目标输入仅通过解码器的第一个注意力层提供查询。由于沿钥匙轴施加填充蒙版，因此填充面膜应具有与源句子长度相同的维度。

回复收藏 0 原文

~没有更多了~