如何使用可变序列长度和多个特征维度训练LSTM模型?
我正在使用MediaPipe功能培训LSTM网络模型,以供手语识别。
我在定义模型方面遇到问题,因为视频的长度不同。训练模型时,出现错误。我需要正确的凯拉斯层设置。
实际上,我发现的所有解决方案均来自具有可变时间步维数的LSTM模型,但只有一个功能(例如一个单词)或视频输入具有多个功能,但对于所有视频而言,稳定的时间段,但对于可变时间段的单个解决方案和几个解决方案却不是一个。特征。
数据集:由35个符号(来自字母符号 + 5个数字符号)形成的
每个视频都有不同的长度,从介质只识别4帧的视频到具有111帧的其他帧。
对于每个框架,我正在用MediaPipe库从手中提取21个地标,这些里程碑都有5个功能(X,Y,Z,可见性和存在),因此每帧都可以使21*5 = 105个功能。
模型输入:,
因为LSTM的输入需要是一个具有恒定帧数的numpy阵列,我已经用以下代码填充了带有值0的空空间,因此我可以掩盖后来它们
X = np.array([video + [[0] * 105] * (length - len(video)) for video in X]).astype('float32')
结果数组是Dimension (3036,111,105)< / strong>其中3036是数据集中的视频数量,111视频的TimeSteps / Frames和105每个帧的功能数量。
每个视频(111个时间段,105个功能)都是这样。
0.85280,0.84741,-0.07237,0.00000,0.00000 ... 0.000
0.83034,0.93954,-0.11003,0.00000,0.00000 ... 1.000
...
0.82979,0.99424,-0.12224,0.00000,0.00000 ... 1.000
0.00000,0.00000, 0.00000,0.00000,0.00000 ... 1.000
0.00000,0.00000, 0.00000,0.00000,0.00000 ... 0.000
...
0.00000,0.00000, 0.00000,0.00000,0.00000 ... 0.000
模型:
model = Sequential()
model.add(Masking(mask_value=0, input_shape=(None, 35)))
model.add(LSTM(64, return_sequences=True, activation='relu'))
model.add(LSTM(128, return_sequences=True, activation='relu'))
model.add(LSTM(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(len(name_classes.keys()), activation='softmax'))
在这种情况下,我有错误valueerror:Input 0与Lays LSTM不兼容:Expect ]
如何正确设置KERAS层,特别是掩蔽层?
如果我删除了蒙版层,我就可以使其正常工作,但是我的损失函数始终是NAN,所有预测始终是相同的值。
model = Sequential()
model.add(LSTM(64, return_sequences=True, activation='relu', input_shape=(None, 105)))
model.add(LSTM(128, return_sequences=True, activation='relu'))
model.add(LSTM(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(len(name_classes.keys()), activation='softmax'))
注意:其中一些标志是动态的,因此我对整个视频的单个帧不感兴趣。
正如我在掩盖的替代方案中所考虑的那样,例如:
- 仅从每个视频中提取30帧的功能。具有较少帧的视频将填充有阴影数据(视频的重复帧)
- 或用很少的数据过滤帧(例如一个只有4个帧的数据)来清洁数据集,
但是,我更喜欢掩盖NP数组的空白帧, 如果可能的话。
I'm training a LSTM network model for sign language recognition using mediapipe features.
I'm having problems defining the model since the videos have different lengths. When training the model, errors are appearing. I would need the correct setup of the Keras Layers.
In fact, all solutions I have found are from LSTM models with variable timestep dimension but only one feature (for example a word) or video input with several features but same stable timestep for all videos, but not a single solution for variable timestep and several features.
Dataset: Formed by 35 signs (30 from alphabet signs + 5 number signs)
Each video has a different length, from videos where mediapipe only recognize 4 frames to others that have 111 frames.
For each frame, I'm extracting 21 landmarks from the hand with the mediapipe library , each of these landmarkds have 5 features (x,y,z,visibility and presence) so that makes 21*5 = 105 features per frame.
Model Input:
As the input of LSTM need to be a numpy array with constant number of frames, I have fill the empty spaces with value 0 with the following code, so I can mask them later
X = np.array([video + [[0] * 105] * (length - len(video)) for video in X]).astype('float32')
The result array is one of dimension (3036,111,105) where 3036 is the number of videos in the datset, 111 the timesteps / frames of the video and 105 the number of features of each frame.
Each one of the videos (111 timesteps,105 features)) is like this.
0.85280,0.84741,-0.07237,0.00000,0.00000 ... 0.000
0.83034,0.93954,-0.11003,0.00000,0.00000 ... 1.000
...
0.82979,0.99424,-0.12224,0.00000,0.00000 ... 1.000
0.00000,0.00000, 0.00000,0.00000,0.00000 ... 1.000
0.00000,0.00000, 0.00000,0.00000,0.00000 ... 0.000
...
0.00000,0.00000, 0.00000,0.00000,0.00000 ... 0.000
Model:
model = Sequential()
model.add(Masking(mask_value=0, input_shape=(None, 35)))
model.add(LSTM(64, return_sequences=True, activation='relu'))
model.add(LSTM(128, return_sequences=True, activation='relu'))
model.add(LSTM(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(len(name_classes.keys()), activation='softmax'))
In this case I'm having error ValueError: Input 0 is incompatible with layer lstm: expected shape=(None, None, 35), found shape=[None, 111, 105]
How can I setup correctly the Keras layers, specially the Masking layer?
If I remove the Mask layer I was able to make it work, but then my loss function is always Nan and all predictions are always the same value.
model = Sequential()
model.add(LSTM(64, return_sequences=True, activation='relu', input_shape=(None, 105)))
model.add(LSTM(128, return_sequences=True, activation='relu'))
model.add(LSTM(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(len(name_classes.keys()), activation='softmax'))
Note: Some of these signs are dynamic so I'm not interested into taken a single frame for the whole video.
As other solutions I have thought in alternatives of masking like:
- Extract only the features of 30 frames from each video. Videos with less frames would be filled with syntetic data (repeated frames of the video)
- Or filter out frames with little data (like the one with only 4 frames) for cleaning the dataset
However, I would prefer masking the blank frames of the np array, if possible.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你真的很近。您需要更改
“在不同数量的时间段上获得105个功能。掩盖层的工作方式是,如果所有105个功能均为0,则会跳过该时间步。来自 documentation :
You're really close. You need to change the Masking layer input size to:
It gets 105 features over varying number of timesteps. The way masking layer works is if all of those 105 features are 0, it will skip that timestep. From the documentation: