如何使用可变序列长度和多个特征维度训练LSTM模型?

发布于 2025-02-06 14:02:18 字数 2383 浏览 1 评论 0原文

我正在使用MediaPipe功能培训LSTM网络模型,以供手语识别。

我在定义模型方面遇到问题,因为视频的长度不同。训练模型时,出现错误。我需要正确的凯拉斯层设置。

实际上,我发现的所有解决方案均来自具有可变时间步维数的LSTM模型,但只有一个功能(例如一个单词)或视频输入具有多个功能,但对于所有视频而言,稳定的时间段,但对于可变时间段的单个解决方案和几个解决方案却不是一个。特征。

数据集:由35个符号(来自字母符号 + 5个数字符号)形成的

每个视频都有不同的长度,从介质只识别4帧的视频到具有111帧的其他帧。

对于每个框架,我正在用MediaPipe库从手中提取21个地标,这些里程碑都有5个功能(X,Y,Z,可见性和存在),因此每帧都可以使21*5 = 105个功能。

模型输入:

因为LSTM的输入需要是一个具有恒定帧数的numpy阵列,我已经用以下代码填充了带有值0的空空间,因此我可以掩盖后来它们

X = np.array([video + [[0] * 105] * (length - len(video)) for video in X]).astype('float32')

结果数组是Dimension (3036,111,105)< / strong>其中3036是数据集中的视频数量,111视频的TimeSteps / Frames和105每个帧的功能数量。

每个视频(111个时间段,105个功能)都是这样。

0.85280,0.84741,-0.07237,0.00000,0.00000 ... 0.000
0.83034,0.93954,-0.11003,0.00000,0.00000 ... 1.000
...
0.82979,0.99424,-0.12224,0.00000,0.00000 ... 1.000
0.00000,0.00000, 0.00000,0.00000,0.00000 ... 1.000
0.00000,0.00000, 0.00000,0.00000,0.00000 ... 0.000
...
0.00000,0.00000, 0.00000,0.00000,0.00000 ... 0.000

模型:

model = Sequential()
model.add(Masking(mask_value=0, input_shape=(None, 35)))
model.add(LSTM(64, return_sequences=True, activation='relu'))
model.add(LSTM(128, return_sequences=True, activation='relu'))
model.add(LSTM(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(len(name_classes.keys()), activation='softmax'))

在这种情况下,我有错误valueerror:Input 0与Lays LSTM不兼容:Expect ]

如何正确设置KERAS层,特别是掩蔽层?

如果我删除了蒙版层,我就可以使其正常工作,但是我的损失函数始终是NAN,所有预测始终是相同的值。

model = Sequential()
model.add(LSTM(64, return_sequences=True, activation='relu', input_shape=(None, 105)))
model.add(LSTM(128, return_sequences=True, activation='relu'))
model.add(LSTM(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(len(name_classes.keys()), activation='softmax'))

注意:其中一些标志是动态的,因此我对整个视频的单个帧不感兴趣。

正如我在掩盖的替代方案中所考虑的那样,例如:

  • 仅从每个视频中提取30帧的功能。具有较少帧的视频将填充有阴影数据(视频的重复帧)
  • 或用很少的数据过滤帧(例如一个只有4个帧的数据)来清洁数据集,

但是,我更喜欢掩盖NP数组的空白帧, 如果可能的话。

I'm training a LSTM network model for sign language recognition using mediapipe features.

I'm having problems defining the model since the videos have different lengths. When training the model, errors are appearing. I would need the correct setup of the Keras Layers.

In fact, all solutions I have found are from LSTM models with variable timestep dimension but only one feature (for example a word) or video input with several features but same stable timestep for all videos, but not a single solution for variable timestep and several features.

Dataset: Formed by 35 signs (30 from alphabet signs + 5 number signs)

Each video has a different length, from videos where mediapipe only recognize 4 frames to others that have 111 frames.

For each frame, I'm extracting 21 landmarks from the hand with the mediapipe library , each of these landmarkds have 5 features (x,y,z,visibility and presence) so that makes 21*5 = 105 features per frame.

Model Input:

As the input of LSTM need to be a numpy array with constant number of frames, I have fill the empty spaces with value 0 with the following code, so I can mask them later

X = np.array([video + [[0] * 105] * (length - len(video)) for video in X]).astype('float32')

The result array is one of dimension (3036,111,105) where 3036 is the number of videos in the datset, 111 the timesteps / frames of the video and 105 the number of features of each frame.

Each one of the videos (111 timesteps,105 features)) is like this.

0.85280,0.84741,-0.07237,0.00000,0.00000 ... 0.000
0.83034,0.93954,-0.11003,0.00000,0.00000 ... 1.000
...
0.82979,0.99424,-0.12224,0.00000,0.00000 ... 1.000
0.00000,0.00000, 0.00000,0.00000,0.00000 ... 1.000
0.00000,0.00000, 0.00000,0.00000,0.00000 ... 0.000
...
0.00000,0.00000, 0.00000,0.00000,0.00000 ... 0.000

Model:

model = Sequential()
model.add(Masking(mask_value=0, input_shape=(None, 35)))
model.add(LSTM(64, return_sequences=True, activation='relu'))
model.add(LSTM(128, return_sequences=True, activation='relu'))
model.add(LSTM(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(len(name_classes.keys()), activation='softmax'))

In this case I'm having error ValueError: Input 0 is incompatible with layer lstm: expected shape=(None, None, 35), found shape=[None, 111, 105]

How can I setup correctly the Keras layers, specially the Masking layer?

If I remove the Mask layer I was able to make it work, but then my loss function is always Nan and all predictions are always the same value.

model = Sequential()
model.add(LSTM(64, return_sequences=True, activation='relu', input_shape=(None, 105)))
model.add(LSTM(128, return_sequences=True, activation='relu'))
model.add(LSTM(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(len(name_classes.keys()), activation='softmax'))

Note: Some of these signs are dynamic so I'm not interested into taken a single frame for the whole video.

As other solutions I have thought in alternatives of masking like:

  • Extract only the features of 30 frames from each video. Videos with less frames would be filled with syntetic data (repeated frames of the video)
  • Or filter out frames with little data (like the one with only 4 frames) for cleaning the dataset

However, I would prefer masking the blank frames of the np array, if possible.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

简美 2025-02-13 14:02:18

你真的很近。您需要更改

model.add(Masking(mask_value=0, input_shape=(None, 105)))

“在不同数量的时间段上获得105个功能。掩盖层的工作方式是,如果所有105个功能均为0,则会跳过该时间步。来自 documentation

对于输入张量(张量中的尺寸1)中的每个时间步中的每个时间步,如果该时间步的输入张量中的所有值等于Mask_Value,则将在所有下游层中掩盖(跳过)(只需长时间)当他们支持掩盖时)。

You're really close. You need to change the Masking layer input size to:

model.add(Masking(mask_value=0, input_shape=(None, 105)))

It gets 105 features over varying number of timesteps. The way masking layer works is if all of those 105 features are 0, it will skip that timestep. From the documentation:

For each timestep in the input tensor (dimension #1 in the tensor), if all values in the input tensor at that timestep are equal to mask_value, then the timestep will be masked (skipped) in all downstream layers (as long as they support masking).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文