输入 0 与层 Repeat_vector_40 不兼容:预期 ndim=2,发现 ndim=1
我正在开发用于异常检测的 LSTM 自动编码器模型。我的 keras 模型设置如下:
from keras.models import Sequential
from keras import Model, layers
from keras.layers import Layer, Conv1D, Input, Masking, Dense, RNN, LSTM, Dropout, RepeatVector, TimeDistributed, Masking, Reshape
def create_RNN_with_attention():
x=Input(shape=(X_train_dt.shape[1], X_train_dt.shape[2]))
RNN_layer_1 = LSTM(units=64, return_sequences=False)(x)
attention_layer = attention()(RNN_layer_1)
dropout_layer_1 = Dropout(rate=0.2)(attention_layer)
repeat_vector_layer = RepeatVector(n=X_train_dt.shape[1])(dropout_layer_1)
RNN_layer_2 = LSTM(units=64, return_sequences=True)(repeat_vector_layer)
dropout_layer_1 = Dropout(rate=0.2)(RNN_layer_2)
output = TimeDistributed(Dense(X_train_dt.shape[2], trainable=True))(dropout_layer_1)
model=Model(x,output)
model.compile(loss='mae', optimizer='adam')
return model
注意我添加的注意力层,attention_layer
。在添加此之前,模型编译完美,但是在添加此attention_layer之后 - 模型抛出以下错误:ValueError:输入0与层repeat_vector_40不兼容:预期ndim = 2,发现ndim = 1
我的注意力层设置如下:
import keras.backend as K
class attention(Layer):
def __init__(self,**kwargs):
super(attention,self).__init__(**kwargs)
def build(self,input_shape):
self.W=self.add_weight(name='attention_weight', shape=(input_shape[-1],1),
initializer='random_normal', trainable=True)
self.b=self.add_weight(name='attention_bias', shape=(input_shape[1],1),
initializer='zeros', trainable=True)
super(attention, self).build(input_shape)
def call(self,x):
# Alignment scores. Pass them through tanh function
e = K.tanh(K.dot(x,self.W)+self.b)
# Remove dimension of size 1
e = K.squeeze(e, axis=-1)
# Compute the weights
alpha = K.softmax(e)
# Reshape to tensorFlow format
alpha = K.expand_dims(alpha, axis=-1)
# Compute the context vector
context = x * alpha
context = K.sum(context, axis=1)
return context
注意力掩模的想法是让模型像火车一样关注更突出的特征。
为什么我会收到上述错误以及如何解决此问题?
I am developing an LSTM autoencoder model for anomaly detection. I have my keras model setup as below:
from keras.models import Sequential
from keras import Model, layers
from keras.layers import Layer, Conv1D, Input, Masking, Dense, RNN, LSTM, Dropout, RepeatVector, TimeDistributed, Masking, Reshape
def create_RNN_with_attention():
x=Input(shape=(X_train_dt.shape[1], X_train_dt.shape[2]))
RNN_layer_1 = LSTM(units=64, return_sequences=False)(x)
attention_layer = attention()(RNN_layer_1)
dropout_layer_1 = Dropout(rate=0.2)(attention_layer)
repeat_vector_layer = RepeatVector(n=X_train_dt.shape[1])(dropout_layer_1)
RNN_layer_2 = LSTM(units=64, return_sequences=True)(repeat_vector_layer)
dropout_layer_1 = Dropout(rate=0.2)(RNN_layer_2)
output = TimeDistributed(Dense(X_train_dt.shape[2], trainable=True))(dropout_layer_1)
model=Model(x,output)
model.compile(loss='mae', optimizer='adam')
return model
Notice the attention layer that I added, attention_layer
. Before adding this, the model compiled perfectly, however after adding this attention_layer - the model is throwing out the following error: ValueError: Input 0 is incompatible with layer repeat_vector_40: expected ndim=2, found ndim=1
My attention layer is setup as follows:
import keras.backend as K
class attention(Layer):
def __init__(self,**kwargs):
super(attention,self).__init__(**kwargs)
def build(self,input_shape):
self.W=self.add_weight(name='attention_weight', shape=(input_shape[-1],1),
initializer='random_normal', trainable=True)
self.b=self.add_weight(name='attention_bias', shape=(input_shape[1],1),
initializer='zeros', trainable=True)
super(attention, self).build(input_shape)
def call(self,x):
# Alignment scores. Pass them through tanh function
e = K.tanh(K.dot(x,self.W)+self.b)
# Remove dimension of size 1
e = K.squeeze(e, axis=-1)
# Compute the weights
alpha = K.softmax(e)
# Reshape to tensorFlow format
alpha = K.expand_dims(alpha, axis=-1)
# Compute the context vector
context = x * alpha
context = K.sum(context, axis=1)
return context
The idea of the attention mask is to allow the model to focus on more prominent features as is trains.
Why am I getting the error above and how can I fix this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为问题在于这一行:
该层输出形状为
(batch_size, 64)
的张量。因此,这意味着您输出一个向量,然后在批量维度而不是顺序维度上运行注意力机制。这也意味着您的输出具有任何 keras 层都无法接受的压缩批量维度。这就是为什么Repeat
层会引发错误,因为它期望向量的形状至少为(batch_dimension, dim)
。如果你想在序列上运行注意力机制,那么你应该将上面提到的行切换为:
I think that the problem lies in this line:
This layer outputs a tensor of shape
(batch_size, 64)
. So this means that you output a vector and then run attention mechanism on w.r.t. to the batch dimension instead of a sequential dimension. This also means that you output with a squeezed batch dimension that is not acceptable for anykeras
layer. This is why theRepeat
layer raises error as it expects vector of at least shape(batch_dimension, dim)
.If you want to run attention mechanism over a sequence then you should switch the line mentioned above to:
在显示的错误中,问题是输入维度与“lstm_2”层不匹配。该层期望输入为三个维度(batch_size、time_steps、features),但您的输入维度(ndim=2)只有两个维度。
为了解决这个问题,必须保证该层的输入是3D类型的。如果您的模型具有注意力并使用“注意力”层,则必须确保该层的输出被正确指定和匹配。
为了解决这个问题,需要确保‘lstm_2’层的输入的维度正确,并在必要时对模型结构进行必要的更改。
另外,这里不需要RepeatVector 层,应该将其删除。
因为,在注意力模型中,通常不使用“RepeatVector”层。该层有助于重复输入向量与输出时间一样多的次数。但是当使用注意力机制时,不需要重复输出向量,因为重要性适用于所有时间。
更具体地说,在您的模型中,
LSTM'' 层的输出首先通过
RNN_layer_1'' 中的
return_sequences=True'' 获取。然后,通过应用注意力机制(通过“attention”层和“RepeatVector”来重复向量),确定每次的重要性。最后,使用“TimeDistributed Dense”,计算每次的输出。In the shown error, the problem is the mismatch of input dimensions to the `lstm_2' layer. This layer expects the input to be of three dimensions (batch_size, time_steps, features), but your input dimensions (ndim=2) only have two dimensions.
To solve this problem, it must be ensured that the input to this layer is of 3D type. If your model has attention and uses the ``attention'' layer, you must ensure that the output of this layer is correctly specified and matched.
To solve this problem, it is necessary to ensure the correct dimensions for the input to the `lstm_2' layer and, if necessary, make the necessary changes in the model structure.
also, the
RepeatVector
layer is not needed here and should be removed.beacause, In attention models, ``RepeatVector'' layer is usually not used. This layer helps to repeat the input vector as many times as the output time. But when the attention mechanism is used, there is no need to repeat the output vector because the importance is applied to all times.
More specifically, in your model, the output from the
LSTM'' layer is first taken with
return_sequences=True'' inRNN_layer_1''. Then, by applying the attention mechanism (through
attention'' layer andRepeatVector'' to repeat vectors), the importances are determined for each time. Finally, with
TimeDistributed Dense'', the output is calculated for each time.