当前位置：文江博客话题详情

如何使用可变序列长度和多个特征维度训练LSTM模型？

发布于 2025-02-06 14:02:18 字数 2383 浏览 1 评论 0原文

我正在使用MediaPipe功能培训LSTM网络模型，以供手语识别。

我在定义模型方面遇到问题，因为视频的长度不同。训练模型时，出现错误。我需要正确的凯拉斯层设置。

实际上，我发现的所有解决方案均来自具有可变时间步维数的LSTM模型，但只有一个功能（例如一个单词）或视频输入具有多个功能，但对于所有视频而言，稳定的时间段，但对于可变时间段的单个解决方案和几个解决方案却不是一个。特征。

数据集：由35个符号（来自字母符号 + 5个数字符号）形成的

每个视频都有不同的长度，从介质只识别4帧的视频到具有111帧的其他帧。

对于每个框架，我正在用MediaPipe库从手中提取21个地标，这些里程碑都有5个功能（X，Y，Z，可见性和存在），因此每帧都可以使21*5 = 105个功能。

模型输入：，

因为LSTM的输入需要是一个具有恒定帧数的numpy阵列，我已经用以下代码填充了带有值0的空空间，因此我可以掩盖后来它们

X = np.array([video + [[0] * 105] * (length - len(video)) for video in X]).astype('float32')

结果数组是Dimension （3036,111,105）< / strong>其中3036是数据集中的视频数量，111视频的TimeSteps / Frames和105每个帧的功能数量。

每个视频（111个时间段，105个功能）都是这样。

0.85280,0.84741,-0.07237,0.00000,0.00000 ... 0.000
0.83034,0.93954,-0.11003,0.00000,0.00000 ... 1.000
...
0.82979,0.99424,-0.12224,0.00000,0.00000 ... 1.000
0.00000,0.00000, 0.00000,0.00000,0.00000 ... 1.000
0.00000,0.00000, 0.00000,0.00000,0.00000 ... 0.000
...
0.00000,0.00000, 0.00000,0.00000,0.00000 ... 0.000

模型：

model = Sequential()
model.add(Masking(mask_value=0, input_shape=(None, 35)))
model.add(LSTM(64, return_sequences=True, activation='relu'))
model.add(LSTM(128, return_sequences=True, activation='relu'))
model.add(LSTM(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(len(name_classes.keys()), activation='softmax'))

在这种情况下，我有错误valueerror：Input 0与Lays LSTM不兼容：Expect ]

如何正确设置KERAS层，特别是掩蔽层？

如果我删除了蒙版层，我就可以使其正常工作，但是我的损失函数始终是NAN，所有预测始终是相同的值。

model = Sequential()
model.add(LSTM(64, return_sequences=True, activation='relu', input_shape=(None, 105)))
model.add(LSTM(128, return_sequences=True, activation='relu'))
model.add(LSTM(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(len(name_classes.keys()), activation='softmax'))

注意：其中一些标志是动态的，因此我对整个视频的单个帧不感兴趣。

正如我在掩盖的替代方案中所考虑的那样，例如：

仅从每个视频中提取30帧的功能。具有较少帧的视频将填充有阴影数据（视频的重复帧）
或用很少的数据过滤帧（例如一个只有4个帧的数据）来清洁数据集，

但是，我更喜欢掩盖NP数组的空白帧，如果可能的话。

原文

I'm training a LSTM network model for sign language recognition using mediapipe features.

I'm having problems defining the model since the videos have different lengths. When training the model, errors are appearing. I would need the correct setup of the Keras Layers.

In fact, all solutions I have found are from LSTM models with variable timestep dimension but only one feature (for example a word) or video input with several features but same stable timestep for all videos, but not a single solution for variable timestep and several features.

Dataset: Formed by 35 signs (30 from alphabet signs + 5 number signs)

Each video has a different length, from videos where mediapipe only recognize 4 frames to others that have 111 frames.

For each frame, I'm extracting 21 landmarks from the hand with the mediapipe library , each of these landmarkds have 5 features (x,y,z,visibility and presence) so that makes 21*5 = 105 features per frame.

Model Input:

As the input of LSTM need to be a numpy array with constant number of frames, I have fill the empty spaces with value 0 with the following code, so I can mask them later

X = np.array([video + [[0] * 105] * (length - len(video)) for video in X]).astype('float32')

The result array is one of dimension (3036,111,105) where 3036 is the number of videos in the datset, 111 the timesteps / frames of the video and 105 the number of features of each frame.

Each one of the videos (111 timesteps,105 features)) is like this.

0.85280,0.84741,-0.07237,0.00000,0.00000 ... 0.000
0.83034,0.93954,-0.11003,0.00000,0.00000 ... 1.000
...
0.82979,0.99424,-0.12224,0.00000,0.00000 ... 1.000
0.00000,0.00000, 0.00000,0.00000,0.00000 ... 1.000
0.00000,0.00000, 0.00000,0.00000,0.00000 ... 0.000
...
0.00000,0.00000, 0.00000,0.00000,0.00000 ... 0.000

Model:

model = Sequential()
model.add(Masking(mask_value=0, input_shape=(None, 35)))
model.add(LSTM(64, return_sequences=True, activation='relu'))
model.add(LSTM(128, return_sequences=True, activation='relu'))
model.add(LSTM(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(len(name_classes.keys()), activation='softmax'))

In this case I'm having error ValueError: Input 0 is incompatible with layer lstm: expected shape=(None, None, 35), found shape=[None, 111, 105]

How can I setup correctly the Keras layers, specially the Masking layer?

If I remove the Mask layer I was able to make it work, but then my loss function is always Nan and all predictions are always the same value.

model = Sequential()
model.add(LSTM(64, return_sequences=True, activation='relu', input_shape=(None, 105)))
model.add(LSTM(128, return_sequences=True, activation='relu'))
model.add(LSTM(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(len(name_classes.keys()), activation='softmax'))

Note: Some of these signs are dynamic so I'm not interested into taken a single frame for the whole video.

As other solutions I have thought in alternatives of masking like:

Extract only the features of 30 frames from each video. Videos with less frames would be filled with syntetic data (repeated frames of the video)
Or filter out frames with little data (like the one with only 4 frames) for cleaning the dataset

However, I would prefer masking the blank frames of the np array, if possible.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

简美 2025-02-13 14:02:18

你真的很近。您需要更改

model.add(Masking(mask_value=0, input_shape=(None, 105)))

“在不同数量的时间段上获得105个功能。掩盖层的工作方式是，如果所有105个功能均为0，则会跳过该时间步。来自 documentation ：

对于输入张量（张量中的尺寸1）中的每个时间步中的每个时间步，如果该时间步的输入张量中的所有值等于Mask_Value，则将在所有下游层中掩盖（跳过）（只需长时间）当他们支持掩盖时）。

You're really close. You need to change the Masking layer input size to:

model.add(Masking(mask_value=0, input_shape=(None, 105)))

It gets 105 features over varying number of timesteps. The way masking layer works is if all of those 105 features are 0, it will skip that timestep. From the documentation:

For each timestep in the input tensor (dimension #1 in the tensor), if all values in the input tensor at that timestep are equal to mask_value, then the timestep will be masked (skipped) in all downstream layers (as long as they support masking).

回复收藏 0 原文

~没有更多了~