Tensorflow 的 seq2seq:tensorflow.python.framework.errors_impl.InvalidArgumentError

发布于 2025-01-10 08:49:41 字数 2363 浏览 3 评论 0原文

我在这里非常密切地关注 Seq2seq 翻译教程 https://www.tensorflow.org/addons/tutorials/networks_seq2seq_nmt#define_the_optimizer_and_the_loss_function 在测试其他数据时。我在实例化编码器时遇到错误,该编码器被定义为

class Encoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
    super(Encoder, self).__init__()
    self.batch_sz = batch_sz
    self.enc_units = enc_units
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)

    ##-------- LSTM layer in Encoder ------- ##
    self.lstm_layer = tf.keras.layers.LSTM(self.enc_units,
                                   return_sequences=True,
                                   return_state=True,
                                   recurrent_initializer='glorot_uniform')

  def call(self, x, hidden):
    x = self.embedding(x)
    output, h, c = self.lstm_layer(x, initial_state = hidden)
    return output, h, c

  def initialize_hidden_state(self):
    return [tf.zeros((self.batch_sz, self.enc_units)), tf.zeros((self.batch_sz, self.enc_units))]

在此处进行测试时它正在下降

# Test Encoder Stack
encoder = Encoder(vocab_size, embedding_dim, units, BATCH_SIZE)

# sample input
sample_hidden = encoder.initialize_hidden_state()
sample_output, sample_h, sample_c = encoder(example_input_batch, sample_hidden)

错误是以下

Traceback (most recent call last):
  File "C:/Users/Seq2seq/Seq2seq-V3.py", line 132, in <module>
    sample_output, sample_h, sample_c = encoder(example_input_batch, sample_hidden)
  File "C:\Users\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "C:/Users/Seq2seq/Seq2seq-V3.py", line 119, in call
    x = self.embedding(x)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Exception encountered when calling layer "embedding" (type Embedding).

indices[12,148] = 106 is not in [0, 106) [Op:ResourceGather]

Call arguments received:
  • inputs=tf.Tensor(shape=(64, 200), dtype=int64)

TF 2.0

这可能是 TF Addons 中的问题,您对此有一些经验吗?

编辑

教程在单词级别标记化:我在字符级别对文本进行编码,106 是我的 vocab_size (字符数)

I am following quite closely the Seq2seq for translation tutorial here https://www.tensorflow.org/addons/tutorials/networks_seq2seq_nmt#define_the_optimizer_and_the_loss_function while testing on other data. I meet an error when instantiating the Encoder which is defined as

class Encoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
    super(Encoder, self).__init__()
    self.batch_sz = batch_sz
    self.enc_units = enc_units
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)

    ##-------- LSTM layer in Encoder ------- ##
    self.lstm_layer = tf.keras.layers.LSTM(self.enc_units,
                                   return_sequences=True,
                                   return_state=True,
                                   recurrent_initializer='glorot_uniform')

  def call(self, x, hidden):
    x = self.embedding(x)
    output, h, c = self.lstm_layer(x, initial_state = hidden)
    return output, h, c

  def initialize_hidden_state(self):
    return [tf.zeros((self.batch_sz, self.enc_units)), tf.zeros((self.batch_sz, self.enc_units))]

It is falling when testing here

# Test Encoder Stack
encoder = Encoder(vocab_size, embedding_dim, units, BATCH_SIZE)

# sample input
sample_hidden = encoder.initialize_hidden_state()
sample_output, sample_h, sample_c = encoder(example_input_batch, sample_hidden)

The error is the following

Traceback (most recent call last):
  File "C:/Users/Seq2seq/Seq2seq-V3.py", line 132, in <module>
    sample_output, sample_h, sample_c = encoder(example_input_batch, sample_hidden)
  File "C:\Users\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "C:/Users/Seq2seq/Seq2seq-V3.py", line 119, in call
    x = self.embedding(x)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Exception encountered when calling layer "embedding" (type Embedding).

indices[12,148] = 106 is not in [0, 106) [Op:ResourceGather]

Call arguments received:
  • inputs=tf.Tensor(shape=(64, 200), dtype=int64)

TF 2.0

This might be a problem in TF Addons, would you have some experience with that?

EDIT

the tutorial tokenizes at the word level : I encode the text at the char level and 106 is my vocab_size (number of characters)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

靑春怀旧 2025-01-17 08:49:42

这已经足够暗示了,事实上

indices[12,148] = 106 is not in [0, 106) [Op:ResourceGather]

我必须确保我的词汇量是 vocab_size = len(vocab)+1 。数据集构建现在正在进行

text = open(FILE_PATH, 'rb').read().decode(encoding='utf-8') 
vocab = sorted(set(text))

# [...]

vocab_size = len(vocab)+1

This is enough of a hint in fact

indices[12,148] = 106 is not in [0, 106) [Op:ResourceGather]

I had to make sure my vocabulary is vocab_size = len(vocab)+1. The dataset construction now goes

text = open(FILE_PATH, 'rb').read().decode(encoding='utf-8') 
vocab = sorted(set(text))

# [...]

vocab_size = len(vocab)+1
往日 2025-01-17 08:49:42

词汇量大小应始终为:len(vocab)+1

vocabulary size should be always : len(vocab)+1

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文