预先训练的BERT不是LSTM层的正确形状:值误差,新数组的总大小必须不变

发布于 2025-01-22 09:38:58 字数 1281 浏览 0 评论 0原文

我试图在暹罗神经网络上使用预训练的BERT模型。但是,我遇到了将BERT模型传递到共享LSTM层的问题。我遇到以下错误:

ValueError: Exception encountered when calling layer "reshape_4" (type Reshape).

total size of new array must be unchanged, input_shape = [768], output_shape = [64, 768, 1]

Call arguments received:
  • inputs=tf.Tensor(shape=(None, 768), dtype=float32)

我在其他几篇文章中阅读了我所馈入LSTM的尺寸应为[batch_size,768,1]。但是,当我尝试重塑时,我会遇到错误。如何解决此错误?

input_1 = Input(shape=(), dtype=tf.string, name='text')
preprocessed_text_1 = bert_preprocess(input_1)
outputs_1 = bert_encoder(preprocessed_text_1)
e1 = tf.keras.layers.Reshape((64, 768, 1))(outputs_1['pooled_output'])

input_2 = Input(shape=(), dtype=tf.string, name='text')
preprocessed_text_2 = bert_preprocess(input_2)
outputs_2 = bert_encoder(preprocessed_text_2)
e2 = Reshape((64, 768, 1))(outputs_2['pooled_output'])

lstm_layer = Bidirectional(LSTM(50, dropout=0.2, recurrent_dropout=0.2)) # Won't work on GPU

x1 = lstm_layer(e1)
x2 = lstm_layer(e2)

mhd = lambda x: exponent_neg_cosine_distance(x[0], x[1]) 
merged = Lambda(function=mhd, output_shape=lambda x: x[0], name='cosine_distance')([x1, x2])
preds = Dense(1, activation='sigmoid')(merged)
model = Model(inputs=[input_1, input_2], outputs=preds)

I am attempting to use a pre-trained BERT model on a Siamese neural network. However, I am having issues passing the BERT model to the shared LSTM layer. I encounter the below error:

ValueError: Exception encountered when calling layer "reshape_4" (type Reshape).

total size of new array must be unchanged, input_shape = [768], output_shape = [64, 768, 1]

Call arguments received:
  • inputs=tf.Tensor(shape=(None, 768), dtype=float32)

I read in several other posts that the dimensions I feed into the LSTM should be [batch_size, 768, 1]. However, when I attempt to reshape, I run into the error. How can I resolve this error?

input_1 = Input(shape=(), dtype=tf.string, name='text')
preprocessed_text_1 = bert_preprocess(input_1)
outputs_1 = bert_encoder(preprocessed_text_1)
e1 = tf.keras.layers.Reshape((64, 768, 1))(outputs_1['pooled_output'])

input_2 = Input(shape=(), dtype=tf.string, name='text')
preprocessed_text_2 = bert_preprocess(input_2)
outputs_2 = bert_encoder(preprocessed_text_2)
e2 = Reshape((64, 768, 1))(outputs_2['pooled_output'])

lstm_layer = Bidirectional(LSTM(50, dropout=0.2, recurrent_dropout=0.2)) # Won't work on GPU

x1 = lstm_layer(e1)
x2 = lstm_layer(e2)

mhd = lambda x: exponent_neg_cosine_distance(x[0], x[1]) 
merged = Lambda(function=mhd, output_shape=lambda x: x[0], name='cosine_distance')([x1, x2])
preds = Dense(1, activation='sigmoid')(merged)
model = Model(inputs=[input_1, input_2], outputs=preds)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

爱她像谁 2025-01-29 09:38:58

您必须从重塑层中删除批处理大小(= 64)。

You have to remove the batch size (=64) from the Reshape layers.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文