输入 0 与层 Repeat_vector_40 不兼容：预期 ndim=2，发现 ndim=1

发布于 2025-01-13 00:23:33 字数 2210 浏览 0 评论 0原文

我正在开发用于异常检测的 LSTM 自动编码器模型。我的 keras 模型设置如下：

from keras.models import Sequential

from keras import Model, layers
from keras.layers import Layer, Conv1D, Input, Masking, Dense, RNN, LSTM, Dropout, RepeatVector, TimeDistributed, Masking, Reshape

def create_RNN_with_attention():
    x=Input(shape=(X_train_dt.shape[1], X_train_dt.shape[2]))
    RNN_layer_1 = LSTM(units=64, return_sequences=False)(x)
    attention_layer = attention()(RNN_layer_1)
    dropout_layer_1 = Dropout(rate=0.2)(attention_layer)
    repeat_vector_layer = RepeatVector(n=X_train_dt.shape[1])(dropout_layer_1)
    RNN_layer_2 = LSTM(units=64, return_sequences=True)(repeat_vector_layer)
    dropout_layer_1 = Dropout(rate=0.2)(RNN_layer_2)
    output = TimeDistributed(Dense(X_train_dt.shape[2], trainable=True))(dropout_layer_1)
    model=Model(x,output)
    model.compile(loss='mae', optimizer='adam')    
    return model

注意我添加的注意力层，attention_layer。在添加此之前，模型编译完美，但是在添加此attention_layer之后 - 模型抛出以下错误：ValueError：输入0与层repeat_vector_40不兼容：预期ndim = 2，发现ndim = 1

我的注意力层设置如下：

import keras.backend as K
class attention(Layer):
    def __init__(self,**kwargs):
        super(attention,self).__init__(**kwargs)
 
    def build(self,input_shape):
        self.W=self.add_weight(name='attention_weight', shape=(input_shape[-1],1), 
                               initializer='random_normal', trainable=True)
        self.b=self.add_weight(name='attention_bias', shape=(input_shape[1],1), 
                               initializer='zeros', trainable=True)        
        super(attention, self).build(input_shape)
 
    def call(self,x):
        # Alignment scores. Pass them through tanh function
        e = K.tanh(K.dot(x,self.W)+self.b)
        # Remove dimension of size 1
        e = K.squeeze(e, axis=-1)   
        # Compute the weights
        alpha = K.softmax(e)
        # Reshape to tensorFlow format
        alpha = K.expand_dims(alpha, axis=-1)
        # Compute the context vector
        context = x * alpha
        context = K.sum(context, axis=1)
        return context

注意力掩模的想法是让模型像火车一样关注更突出的特征。

为什么我会收到上述错误以及如何解决此问题？

原文

I am developing an LSTM autoencoder model for anomaly detection. I have my keras model setup as below:

from keras.models import Sequential

from keras import Model, layers
from keras.layers import Layer, Conv1D, Input, Masking, Dense, RNN, LSTM, Dropout, RepeatVector, TimeDistributed, Masking, Reshape

def create_RNN_with_attention():
    x=Input(shape=(X_train_dt.shape[1], X_train_dt.shape[2]))
    RNN_layer_1 = LSTM(units=64, return_sequences=False)(x)
    attention_layer = attention()(RNN_layer_1)
    dropout_layer_1 = Dropout(rate=0.2)(attention_layer)
    repeat_vector_layer = RepeatVector(n=X_train_dt.shape[1])(dropout_layer_1)
    RNN_layer_2 = LSTM(units=64, return_sequences=True)(repeat_vector_layer)
    dropout_layer_1 = Dropout(rate=0.2)(RNN_layer_2)
    output = TimeDistributed(Dense(X_train_dt.shape[2], trainable=True))(dropout_layer_1)
    model=Model(x,output)
    model.compile(loss='mae', optimizer='adam')    
    return model

Notice the attention layer that I added, attention_layer. Before adding this, the model compiled perfectly, however after adding this attention_layer - the model is throwing out the following error: ValueError: Input 0 is incompatible with layer repeat_vector_40: expected ndim=2, found ndim=1

My attention layer is setup as follows:

import keras.backend as K
class attention(Layer):
    def __init__(self,**kwargs):
        super(attention,self).__init__(**kwargs)
 
    def build(self,input_shape):
        self.W=self.add_weight(name='attention_weight', shape=(input_shape[-1],1), 
                               initializer='random_normal', trainable=True)
        self.b=self.add_weight(name='attention_bias', shape=(input_shape[1],1), 
                               initializer='zeros', trainable=True)        
        super(attention, self).build(input_shape)
 
    def call(self,x):
        # Alignment scores. Pass them through tanh function
        e = K.tanh(K.dot(x,self.W)+self.b)
        # Remove dimension of size 1
        e = K.squeeze(e, axis=-1)   
        # Compute the weights
        alpha = K.softmax(e)
        # Reshape to tensorFlow format
        alpha = K.expand_dims(alpha, axis=-1)
        # Compute the context vector
        context = x * alpha
        context = K.sum(context, axis=1)
        return context

The idea of the attention mask is to allow the model to focus on more prominent features as is trains.

Why am I getting the error above and how can I fix this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

只有一腔孤勇 2025-01-20 00:23:33

我认为问题在于这一行：

RNN_layer_1 = LSTM(units=64, return_sequences=False)(x)

该层输出形状为 (batch_size, 64) 的张量。因此，这意味着您输出一个向量，然后在批量维度而不是顺序维度上运行注意力机制。这也意味着您的输出具有任何 keras 层都无法接受的压缩批量维度。这就是为什么 Repeat 层会引发错误，因为它期望向量的形状至少为 (batch_dimension, dim)。

如果你想在序列上运行注意力机制，那么你应该将上面提到的行切换为：

RNN_layer_1 = LSTM(units=64, return_sequences=True)(x)

I think that the problem lies in this line:

RNN_layer_1 = LSTM(units=64, return_sequences=False)(x)

This layer outputs a tensor of shape (batch_size, 64). So this means that you output a vector and then run attention mechanism on w.r.t. to the batch dimension instead of a sequential dimension. This also means that you output with a squeezed batch dimension that is not acceptable for any keras layer. This is why the Repeat layer raises error as it expects vector of at least shape (batch_dimension, dim).

If you want to run attention mechanism over a sequence then you should switch the line mentioned above to:

RNN_layer_1 = LSTM(units=64, return_sequences=True)(x)

回复收藏 0 原文

眼藏柔 2025-01-20 00:23:33

在显示的错误中，问题是输入维度与“lstm_2”层不匹配。该层期望输入为三个维度（batch_size、time_steps、features），但您的输入维度（ndim=2）只有两个维度。

为了解决这个问题，必须保证该层的输入是3D类型的。如果您的模型具有注意力并使用“注意力”层，则必须确保该层的输出被正确指定和匹配。

为了解决这个问题，需要确保‘lstm_2’层的输入的维度正确，并在必要时对模型结构进行必要的更改。

另外，这里不需要RepeatVector 层，应该将其删除。
因为，在注意力模型中，通常不使用“RepeatVector”层。该层有助于重复输入向量与输出时间一样多的次数。但是当使用注意力机制时，不需要重复输出向量，因为重要性适用于所有时间。

更具体地说，在您的模型中，LSTM'' 层的输出首先通过 RNN_layer_1'' 中的return_sequences=True'' 获取。然后，通过应用注意力机制（通过“attention”层和“RepeatVector”来重复向量），确定每次的重要性。最后，使用“TimeDistributed Dense”，计算每次的输出。