具有音频功能的 Transformer(Multi-Head-Attention),Val 精度始终相同
我在创建变压器模型时遇到一些问题。无论我如何更改参数,我总是能获得 11.86% 的验证准确率,如果我仅使用 1 个输入训练模型,这一准确率甚至不会改变。仅当我更改验证数据的大小时,准确性才会发生变化。我尝试遵循本指南。我有 500 个音频,我提取了它们的 20 个 MFCC 特征。假设我有 (500, 20, 1) 大小的数据。标签是属于这些音频的情感。由于我已经拥有数据,因此我没有使用任何嵌入或标记化。现在的代码如下:
class TransformerBlock(layers.Layer):
def __init__(self, key_dim, num_heads, ff_dim, rate=0.1):
super(TransformerBlock, self).__init__()
self.att = layers.MultiHeadAttention(num_heads=num_heads, key_dim=key_dim)
self.ffn = keras.Sequential(
[layers.Dense(ff_dim, activation="relu"), layers.Dense(key_dim),]
)
self.layernorm1 = layers.LayerNormalization(epsilon=1e-6)
self.layernorm2 = layers.LayerNormalization(epsilon=1e-6)
self.dropout1 = layers.Dropout(rate)
self.dropout2 = layers.Dropout(rate)
def call(self, inputs, training):
attn_output = self.att(inputs, inputs)
attn_output = self.dropout1(attn_output, training=training)
out1 = self.layernorm1(inputs + attn_output)
ffn_output = self.ffn(out1)
ffn_output = self.dropout2(ffn_output, training=training)
return self.layernorm2(out1 + ffn_output)
key_dim = 2 # Embedding size for each token
num_heads = 2 # Number of attention heads
ff_dim = 32 # Hidden layer size in feed forward network inside transformer
inputs = layers.Input(shape=(20,1))
transformer_block = TransformerBlock(key_dim, num_heads, ff_dim)
x = transformer_block(inputs)
x = layers.GlobalAveragePooling1D()(x)
x = layers.Dropout(0.1)(x)
x = layers.Dense(20, activation="relu")(x)
x = layers.Dropout(0.1)(x)
outputs = layers.Dense(7, activation="softmax")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
optimizer = tf.keras.optimizers.Adam(learning_rate = 0.001, beta_1=0.9, beta_2=0.999,epsilon=1e-7,amsgrad=False,name='Adam')
model.compile(optimizer=optimizer, loss="categorical_crossentropy", metrics=["accuracy"])
history = model.fit(
x_train, y_train, batch_size=16, epochs=15, validation_data=(x_val, y_val)
)
这开始训练,但结果是这样的:
30/30 [==============================] - 1s 17ms/step - loss: 1.9461 - accuracy: 0.1213 - val_loss: 1.9471 - val_accuracy: 0.1186
Epoch 2/15
30/30 [==============================] - 0s 11ms/step - loss: 1.9457 - accuracy: 0.1489 - val_loss: 1.9479 - val_accuracy: 0.1186
Epoch 3/15
30/30 [==============================] - 0s 11ms/step - loss: 1.9456 - accuracy: 0.1489 - val_loss: 1.9489 - val_accuracy: 0.1186
Epoch 4/15
30/30 [==============================] - 0s 11ms/step - loss: 1.9452 - accuracy: 0.1489 - val_loss: 1.9501 - val_accuracy: 0.1186
Epoch 5/15
30/30 [==============================] - 0s 11ms/step - loss: 1.9451 - accuracy: 0.1489 - val_loss: 1.9510 - val_accuracy: 0.1186
Epoch 6/15
30/30 [==============================] - 0s 11ms/step - loss: 1.9450 - accuracy: 0.1277 - val_loss: 1.9523 - val_accuracy: 0.1186
即使我更改任何参数,甚至更改训练数据的大小,Val 准确度仍为 11.86%。我认为我在创建模型时犯了一个错误,但我找不到问题所在。一开始我尝试了归一化和一个热编码,但当我看到即使我使用 1 个数据进行训练时,它也给出了 11.86% 的准确率,我认为这个问题完全独立于数据。模型的创建应该有问题,但我看不到它。
这是模型摘要:
编辑1:我尝试增加密集层的单位,但结果没有改变。
编辑 2:我尝试使用相同的数据训练一个简单的 CNN 模型。我也得到了 11.86% 的准确率。所以我不确定是什么问题。也许数据有问题?
I have some problems with the creation of a transformer model. Whatever I change in the parameters, I always get 11.86% validation accuracy which is not even changed If I train the model with only 1 input. The accuracy changes only if I change the size of the validation data. I tried to follow this guide. I have 500 audio and I extracted their 20 MFCC features. So let's say I have a (500, 20, 1) size of data. The labels are the emotions that belongs to these audios. Since I already have the data, I did not use any embedding or tokenizing. Here is the code right now:
class TransformerBlock(layers.Layer):
def __init__(self, key_dim, num_heads, ff_dim, rate=0.1):
super(TransformerBlock, self).__init__()
self.att = layers.MultiHeadAttention(num_heads=num_heads, key_dim=key_dim)
self.ffn = keras.Sequential(
[layers.Dense(ff_dim, activation="relu"), layers.Dense(key_dim),]
)
self.layernorm1 = layers.LayerNormalization(epsilon=1e-6)
self.layernorm2 = layers.LayerNormalization(epsilon=1e-6)
self.dropout1 = layers.Dropout(rate)
self.dropout2 = layers.Dropout(rate)
def call(self, inputs, training):
attn_output = self.att(inputs, inputs)
attn_output = self.dropout1(attn_output, training=training)
out1 = self.layernorm1(inputs + attn_output)
ffn_output = self.ffn(out1)
ffn_output = self.dropout2(ffn_output, training=training)
return self.layernorm2(out1 + ffn_output)
key_dim = 2 # Embedding size for each token
num_heads = 2 # Number of attention heads
ff_dim = 32 # Hidden layer size in feed forward network inside transformer
inputs = layers.Input(shape=(20,1))
transformer_block = TransformerBlock(key_dim, num_heads, ff_dim)
x = transformer_block(inputs)
x = layers.GlobalAveragePooling1D()(x)
x = layers.Dropout(0.1)(x)
x = layers.Dense(20, activation="relu")(x)
x = layers.Dropout(0.1)(x)
outputs = layers.Dense(7, activation="softmax")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
optimizer = tf.keras.optimizers.Adam(learning_rate = 0.001, beta_1=0.9, beta_2=0.999,epsilon=1e-7,amsgrad=False,name='Adam')
model.compile(optimizer=optimizer, loss="categorical_crossentropy", metrics=["accuracy"])
history = model.fit(
x_train, y_train, batch_size=16, epochs=15, validation_data=(x_val, y_val)
)
This starts training but the results are like this:
30/30 [==============================] - 1s 17ms/step - loss: 1.9461 - accuracy: 0.1213 - val_loss: 1.9471 - val_accuracy: 0.1186
Epoch 2/15
30/30 [==============================] - 0s 11ms/step - loss: 1.9457 - accuracy: 0.1489 - val_loss: 1.9479 - val_accuracy: 0.1186
Epoch 3/15
30/30 [==============================] - 0s 11ms/step - loss: 1.9456 - accuracy: 0.1489 - val_loss: 1.9489 - val_accuracy: 0.1186
Epoch 4/15
30/30 [==============================] - 0s 11ms/step - loss: 1.9452 - accuracy: 0.1489 - val_loss: 1.9501 - val_accuracy: 0.1186
Epoch 5/15
30/30 [==============================] - 0s 11ms/step - loss: 1.9451 - accuracy: 0.1489 - val_loss: 1.9510 - val_accuracy: 0.1186
Epoch 6/15
30/30 [==============================] - 0s 11ms/step - loss: 1.9450 - accuracy: 0.1277 - val_loss: 1.9523 - val_accuracy: 0.1186
Val accuracy is 11.86% even I change any of the parameters and even I change the size of the train data. I think I made a mistake while creating the model but I could not find the problem. At first I tried normalization and one hot encoding but after I saw that even I use 1 data for the train, it gives 11.86% accuracy, I thought that this problem is completely independent from the data. There should be a problem in the creation of the model but I can not see it.
This is the model summary:
Edit 1: I tried increasing the unit of dense layers but the result did not change.
Edit 2: I tried to train a simple CNN model with the same data. I got exactly 11.86% accuracy with it too. So I am not sure what is the problem. Maybe the data has a problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论