在 Keras 上使用时间步的循环神经网络的问题
我正在尝试用 Keras 设计一个循环分类网络。我分析了视频帧的关键特征,并从中确定视频中某些事件何时发生。
具体来说,我每个帧都有一个矩阵 (30 x 2),它表示各种给定对象的位置。从这些位置,我希望网络能够检测 4 个不同的事件,以及它们发生在哪些帧中。
举个例子,假设我已经检测到每帧中 30 辆汽车的位置,并且我希望网络学习检测以下帧:
- 一辆车停下
- 一辆车启动
- 两辆车相撞
- 一辆车转弯
在每一帧中,一个或一个汽车 转弯这些事件均不会发生(类别 0),但最多只能发生一个。
值得注意的是,要识别这 4 个事件,必须知道前一帧和后一帧的数据。例如,要知道两辆车发生碰撞,就必须知道碰撞之前两辆车都在运动,以及碰撞之后都没有移动。
按照这个例子,为了澄清起见,假设我有一个 100 帧的样本,其中在第 4 帧和第 75 帧发生碰撞,在第 12 帧停止,在第 37 帧开始,在第 3、30 和 60 帧转弯。它的输入为 100x30x2,输出为 100x1。
几个小时后,我感觉我在向 Keras 表明他是如何成为模型的方式方面并没有很好地理解某些东西。
到目前为止,我一直在尝试以下方法,改变 LSTM 层的数量和分类神经元的数量:
model = keras.Sequential()
model.add(layers.LSTM(100, input_shape=(30, 2)))
model.add(layers.Dense(16, activation = 'relu'))
model.add(layers.Dense(5, activation = 'sigmoid'))
model.summary()
model.compile(loss='sparse_categorical_crossentropy', optimizer = 'Adam', metrics = ['SparseCategoricalAccuracy'])
我还尝试引入这种变化
model3.add(layers.LSTM(100, input_shape=(30, 2), return_sequences=True))
,以便您不仅考虑到最终输出有效,而且如果我不要添加 flatten
它不起作用,我推断我不太了解这个问题。
编辑 1:
根据您的建议,我现在有了以下模型:
我从数据集开始,存储在输入 xp
和输出 <代码>yp。在这里,我打印了两个变量的大小:
xp.shape, yp.shape
((203384, 25, 2), (203384, 1))
然后我使用 keras.utils 编码 yp,并将每个输入元素的形状从 (25x2) 矩阵更改为一个 (1x50) 向量:
n = len(xp)
yp_encoded = keras.utils.to_categorical(yp)
xp_reshaped = xp[0:n,:].reshape(n,1,50)
print(n, xp_reshaped.shape, yp_encoded.shape)
203384, (203384, 1, 50), (203384, 5)
之后,我按照我们所说的方式定义模型,仅使用 LSTM 层
batch_size = 10
model = keras.Sequential()
model.add(layers.LSTM(100, batch_input_shape=(batch_size, 1, 50), activation = 'relu', return_sequences = True, stateful=True))
model.add(layers.LSTM(5, stateful=True, activation = 'softmax'))
model.summary()
Model: "sequential_140"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_184 (LSTM) (10, 1, 100) 58800
lstm_185 (LSTM) (10, 5) 2120
=================================================================
Total params: 60,920
Trainable params: 60,920
Non-trainable params: 0
因此,据我了解,我有一个输入为 (batch_size, 1, 100)
的 LSTM 模型元素,我应该适合 (batch_size, 5)
的输出。
然后,我进行模型编译和拟合:
model.compile(loss='categorical_crossentropy', optimizer = 'Adam', metrics = ['SparseCategoricalAccuracy'])
model.fit(xp_reshaped, yp_encoded, epochs = 5, batch_size = batch_size, shuffle = False)
出现以下错误:
ValueError: Can not squeeze dim[1], expected a dimension of 1, got 5 for '{{node Squeeze}} = Squeeze[T=DT_FLOAT, squeeze_dims=[-1]](IteratorGetNext:1)' with input shapes: [10,5].
I am trying to design a recurrent classificatory network with Keras. I have analyzed key characteristics of the frames of a video, and from them I want to identify when certain events occur during the video.
Specifically, I have a matrix (30 x 2) for each frame, which represents the positions of various given objects. From these positions, I would like the network to detect 4 different events, as well as in which frames they occur.
As an example, suppose I have the position of 30 cars in each frame already detected, and I want the network to learn to detect the frames in which:
- a car stops
- a car starts
- two cars collide
- a car turns
In each frame, one or none of these events can occur (category 0), but not more than one.
Something remarkable is that to identify these 4 events it is necessary to know both the data of the previous frames and those of the later ones. For example, to know that two cars collide, it is necessary to know that beforehand both were in motion, as well as that after the collision neither moves.
Following this example, and just to clarify, suppose I have a sample of 100 frames, in which there is a crash at frames 4 and 75, a stop at 12, a start at 37, and turns at 3, 30, and 60. It would have an input of 100x30x2, and an output of 100x1.
After several hours, I get the feeling that I am not understanding something well in the way of indicating to Keras how he is the model.
So far I have been trying the following, with variations in the number of LSTM layers and the number of classification neurons:
model = keras.Sequential()
model.add(layers.LSTM(100, input_shape=(30, 2)))
model.add(layers.Dense(16, activation = 'relu'))
model.add(layers.Dense(5, activation = 'sigmoid'))
model.summary()
model.compile(loss='sparse_categorical_crossentropy', optimizer = 'Adam', metrics = ['SparseCategoricalAccuracy'])
I have also tried to introduce the variation
model3.add(layers.LSTM(100, input_shape=(30, 2), return_sequences=True))
so that you take into account that not only is the final output valid, but if I don't add a flatten
it doesn't work, and I deduce that I don't understand the matter well.
Edit 1:
After your advice, I have now the following model:
I start with the dataset, stored in an input xp
and an output yp
. Here I print the sizes of both variables:
xp.shape, yp.shape
((203384, 25, 2), (203384, 1))
Then I'm encoding yp
with keras.utils
, and I change the shape of each input element from a (25x2) matrix to a (1x50) vector:
n = len(xp)
yp_encoded = keras.utils.to_categorical(yp)
xp_reshaped = xp[0:n,:].reshape(n,1,50)
print(n, xp_reshaped.shape, yp_encoded.shape)
203384, (203384, 1, 50), (203384, 5)
After, I define the model as we talked, with just LSTM layers
batch_size = 10
model = keras.Sequential()
model.add(layers.LSTM(100, batch_input_shape=(batch_size, 1, 50), activation = 'relu', return_sequences = True, stateful=True))
model.add(layers.LSTM(5, stateful=True, activation = 'softmax'))
model.summary()
Model: "sequential_140"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_184 (LSTM) (10, 1, 100) 58800
lstm_185 (LSTM) (10, 5) 2120
=================================================================
Total params: 60,920
Trainable params: 60,920
Non-trainable params: 0
So, as I understand, I have a LSTM model with an input of (batch_size, 1, 100)
elements, that I should fit with an output of (batch_size, 5)
.
Then, I do the model compilation, and the fit:
model.compile(loss='categorical_crossentropy', optimizer = 'Adam', metrics = ['SparseCategoricalAccuracy'])
model.fit(xp_reshaped, yp_encoded, epochs = 5, batch_size = batch_size, shuffle = False)
I get the following error:
ValueError: Can not squeeze dim[1], expected a dimension of 1, got 5 for '{{node Squeeze}} = Squeeze[T=DT_FLOAT, squeeze_dims=[-1]](IteratorGetNext:1)' with input shapes: [10,5].
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您正在尝试解决分类问题 - 将帧分类为具有以下类别的事件:开始、停止、转动、碰撞和 0 类(什么也没有发生)。你选择了正确的方法——LSTM,尽管你的架构并不好。
你的所有层都应该是具有
stateful=True
和return_sequences=True
的 LSTM,你应该将你的目标变量转换为 one-hot 编码,将最后一层的激活设置为 softmax,它的输出形状将为 (n_frames, 4)。You're trying to solve classification problem - classify frames as events with classes: start, stop, turn, collide and the 0 class (nothing happens). You've chosen the right approach - LSTMs, though your archtitecture is not good.
All your layers should be LSTMs with
stateful=True
andreturn_sequences=True
, you should transform your target variable to one-hot encoding, set last layer's activation to softmax, it's output shape will be (n_frames, 4).