当前位置：文江博客话题详情

在 Keras 上使用时间步的循环神经网络的问题

发布于 2025-01-17 01:04:52 字数 3156 浏览 0 评论 0原文

我正在尝试用 Keras 设计一个循环分类网络。我分析了视频帧的关键特征，并从中确定视频中某些事件何时发生。

具体来说，我每个帧都有一个矩阵 (30 x 2)，它表示各种给定对象的位置。从这些位置，我希望网络能够检测 4 个不同的事件，以及它们发生在哪些帧中。

举个例子，假设我已经检测到每帧中 30 辆汽车的位置，并且我希望网络学习检测以下帧：

一辆车停下
一辆车启动
两辆车相撞
一辆车转弯

在每一帧中，一个或一个汽车转弯这些事件均不会发生（类别 0），但最多只能发生一个。

值得注意的是，要识别这 4 个事件，必须知道前一帧和后一帧的数据。例如，要知道两辆车发生碰撞，就必须知道碰撞之前两辆车都在运动，以及碰撞之后都没有移动。

按照这个例子，为了澄清起见，假设我有一个 100 帧的样本，其中在第 4 帧和第 75 帧发生碰撞，在第 12 帧停止，在第 37 帧开始，在第 3、30 和 60 帧转弯。它的输入为 100x30x2，输出为 100x1。

几个小时后，我感觉我在向 Keras 表明他是如何成为模型的方式方面并没有很好地理解某些东西。

到目前为止，我一直在尝试以下方法，改变 LSTM 层的数量和分类神经元的数量：

model = keras.Sequential()
model.add(layers.LSTM(100, input_shape=(30, 2)))
model.add(layers.Dense(16, activation = 'relu'))
model.add(layers.Dense(5, activation = 'sigmoid'))
model.summary()
model.compile(loss='sparse_categorical_crossentropy', optimizer = 'Adam', metrics = ['SparseCategoricalAccuracy'])

我还尝试引入这种变化

model3.add(layers.LSTM(100, input_shape=(30, 2), return_sequences=True))

，以便您不仅考虑到最终输出有效，而且如果我不要添加 flatten 它不起作用，我推断我不太了解这个问题。

编辑 1：

根据您的建议，我现在有了以下模型：

我从数据集开始，存储在输入 xp 和输出 <代码>yp。在这里，我打印了两个变量的大小：

xp.shape, yp.shape

((203384, 25, 2), (203384, 1))

然后我使用 keras.utils 编码 yp，并将每个输入元素的形状从 (25x2) 矩阵更改为一个 (1x50) 向量：

n = len(xp)    
yp_encoded = keras.utils.to_categorical(yp)
xp_reshaped = xp[0:n,:].reshape(n,1,50)
print(n, xp_reshaped.shape, yp_encoded.shape)

203384, (203384, 1, 50), (203384, 5)

之后，我按照我们所说的方式定义模型，仅使用 LSTM 层

batch_size = 10    
model = keras.Sequential()
model.add(layers.LSTM(100, batch_input_shape=(batch_size, 1, 50), activation = 'relu', return_sequences = True, stateful=True))
model.add(layers.LSTM(5, stateful=True, activation = 'softmax'))
model.summary()


Model: "sequential_140"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm_184 (LSTM)             (10, 1, 100)              58800     
                                                                 
 lstm_185 (LSTM)             (10, 5)                   2120      
                                                                 
=================================================================
Total params: 60,920
Trainable params: 60,920
Non-trainable params: 0

因此，据我了解，我有一个输入为 (batch_size, 1, 100) 的 LSTM 模型元素，我应该适合 (batch_size, 5) 的输出。

然后，我进行模型编译和拟合：

model.compile(loss='categorical_crossentropy', optimizer = 'Adam', metrics = ['SparseCategoricalAccuracy'])
model.fit(xp_reshaped, yp_encoded, epochs = 5, batch_size = batch_size, shuffle = False)

出现以下错误：

ValueError: Can not squeeze dim[1], expected a dimension of 1, got 5 for '{{node Squeeze}} = Squeeze[T=DT_FLOAT, squeeze_dims=[-1]](IteratorGetNext:1)' with input shapes: [10,5].

原文

I am trying to design a recurrent classificatory network with Keras. I have analyzed key characteristics of the frames of a video, and from them I want to identify when certain events occur during the video.

Specifically, I have a matrix (30 x 2) for each frame, which represents the positions of various given objects. From these positions, I would like the network to detect 4 different events, as well as in which frames they occur.

As an example, suppose I have the position of 30 cars in each frame already detected, and I want the network to learn to detect the frames in which:

a car stops
a car starts
two cars collide
a car turns

In each frame, one or none of these events can occur (category 0), but not more than one.

Something remarkable is that to identify these 4 events it is necessary to know both the data of the previous frames and those of the later ones. For example, to know that two cars collide, it is necessary to know that beforehand both were in motion, as well as that after the collision neither moves.

Following this example, and just to clarify, suppose I have a sample of 100 frames, in which there is a crash at frames 4 and 75, a stop at 12, a start at 37, and turns at 3, 30, and 60. It would have an input of 100x30x2, and an output of 100x1.

After several hours, I get the feeling that I am not understanding something well in the way of indicating to Keras how he is the model.

So far I have been trying the following, with variations in the number of LSTM layers and the number of classification neurons:

model = keras.Sequential()
model.add(layers.LSTM(100, input_shape=(30, 2)))
model.add(layers.Dense(16, activation = 'relu'))
model.add(layers.Dense(5, activation = 'sigmoid'))
model.summary()
model.compile(loss='sparse_categorical_crossentropy', optimizer = 'Adam', metrics = ['SparseCategoricalAccuracy'])

I have also tried to introduce the variation

model3.add(layers.LSTM(100, input_shape=(30, 2), return_sequences=True))

so that you take into account that not only is the final output valid, but if I don't add a flatten it doesn't work, and I deduce that I don't understand the matter well.

Edit 1:

After your advice, I have now the following model:

I start with the dataset, stored in an input xp and an output yp. Here I print the sizes of both variables:

xp.shape, yp.shape

((203384, 25, 2), (203384, 1))

Then I'm encoding yp with keras.utils, and I change the shape of each input element from a (25x2) matrix to a (1x50) vector:

n = len(xp)    
yp_encoded = keras.utils.to_categorical(yp)
xp_reshaped = xp[0:n,:].reshape(n,1,50)
print(n, xp_reshaped.shape, yp_encoded.shape)

203384, (203384, 1, 50), (203384, 5)

After, I define the model as we talked, with just LSTM layers

batch_size = 10    
model = keras.Sequential()
model.add(layers.LSTM(100, batch_input_shape=(batch_size, 1, 50), activation = 'relu', return_sequences = True, stateful=True))
model.add(layers.LSTM(5, stateful=True, activation = 'softmax'))
model.summary()


Model: "sequential_140"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm_184 (LSTM)             (10, 1, 100)              58800     
                                                                 
 lstm_185 (LSTM)             (10, 5)                   2120      
                                                                 
=================================================================
Total params: 60,920
Trainable params: 60,920
Non-trainable params: 0

So, as I understand, I have a LSTM model with an input of (batch_size, 1, 100) elements, that I should fit with an output of (batch_size, 5).

Then, I do the model compilation, and the fit:

model.compile(loss='categorical_crossentropy', optimizer = 'Adam', metrics = ['SparseCategoricalAccuracy'])
model.fit(xp_reshaped, yp_encoded, epochs = 5, batch_size = batch_size, shuffle = False)

I get the following error:

ValueError: Can not squeeze dim[1], expected a dimension of 1, got 5 for '{{node Squeeze}} = Squeeze[T=DT_FLOAT, squeeze_dims=[-1]](IteratorGetNext:1)' with input shapes: [10,5].

分享到QQ

分享到微博