具有音频功能的 Transformer(Multi-Head-Attention),Val 精度始终相同

发布于 2025-01-14 11:49:36 字数 3460 浏览 1 评论 0原文

我在创建变压器模型时遇到一些问题。无论我如何更改参数,我总是能获得 11.86% 的验证准确率,如果我仅使用 1 个输入训练模型,这一准确率甚至不会改变。仅当我更改验证数据的大小时,准确性才会发生变化。我尝试遵循本指南。我有 500 个音频,我提取了它们的 20 个 MFCC 特征。假设我有 (500, 20, 1) 大小的数据。标签是属于这些音频的情感。由于我已经拥有数据,因此我没有使用任何嵌入或标记化。现在的代码如下:

class TransformerBlock(layers.Layer):
    def __init__(self, key_dim, num_heads, ff_dim, rate=0.1):
        super(TransformerBlock, self).__init__()
        self.att = layers.MultiHeadAttention(num_heads=num_heads, key_dim=key_dim)
        self.ffn = keras.Sequential(
            [layers.Dense(ff_dim, activation="relu"), layers.Dense(key_dim),]
        )
        self.layernorm1 = layers.LayerNormalization(epsilon=1e-6)
        self.layernorm2 = layers.LayerNormalization(epsilon=1e-6)
        self.dropout1 = layers.Dropout(rate)
        self.dropout2 = layers.Dropout(rate)

    def call(self, inputs, training):
        attn_output = self.att(inputs, inputs)
        attn_output = self.dropout1(attn_output, training=training)
        out1 = self.layernorm1(inputs + attn_output)
        ffn_output = self.ffn(out1)
        ffn_output = self.dropout2(ffn_output, training=training)
        return self.layernorm2(out1 + ffn_output)

key_dim = 2  # Embedding size for each token
num_heads = 2  # Number of attention heads
ff_dim = 32  # Hidden layer size in feed forward network inside transformer

inputs = layers.Input(shape=(20,1))
transformer_block = TransformerBlock(key_dim, num_heads, ff_dim)
x = transformer_block(inputs)
x = layers.GlobalAveragePooling1D()(x)
x = layers.Dropout(0.1)(x)
x = layers.Dense(20, activation="relu")(x)
x = layers.Dropout(0.1)(x)
outputs = layers.Dense(7, activation="softmax")(x)

model = keras.Model(inputs=inputs, outputs=outputs)
optimizer = tf.keras.optimizers.Adam(learning_rate = 0.001, beta_1=0.9, beta_2=0.999,epsilon=1e-7,amsgrad=False,name='Adam')
model.compile(optimizer=optimizer, loss="categorical_crossentropy", metrics=["accuracy"])

history = model.fit(
    x_train, y_train, batch_size=16, epochs=15, validation_data=(x_val, y_val)
)

这开始训练,但结果是这样的:

30/30 [==============================] - 1s 17ms/step - loss: 1.9461 - accuracy: 0.1213 - val_loss: 1.9471 - val_accuracy: 0.1186
Epoch 2/15
30/30 [==============================] - 0s 11ms/step - loss: 1.9457 - accuracy: 0.1489 - val_loss: 1.9479 - val_accuracy: 0.1186
Epoch 3/15
30/30 [==============================] - 0s 11ms/step - loss: 1.9456 - accuracy: 0.1489 - val_loss: 1.9489 - val_accuracy: 0.1186
Epoch 4/15
30/30 [==============================] - 0s 11ms/step - loss: 1.9452 - accuracy: 0.1489 - val_loss: 1.9501 - val_accuracy: 0.1186
Epoch 5/15
30/30 [==============================] - 0s 11ms/step - loss: 1.9451 - accuracy: 0.1489 - val_loss: 1.9510 - val_accuracy: 0.1186
Epoch 6/15
30/30 [==============================] - 0s 11ms/step - loss: 1.9450 - accuracy: 0.1277 - val_loss: 1.9523 - val_accuracy: 0.1186

即使我更改任何参数,甚至更改训练数据的大小,Val 准确度仍为 11.86%。我认为我在创建模型时犯了一个错误,但我找不到问题所在。一开始我尝试了归一化和一个热编码,但当我看到即使我使用 1 个数据进行训练时,它也给出了 11.86% 的准确率,我认为这个问题完全独立于数据。模型的创建应该有问题,但我看不到它。

这是模型摘要:

model Summary

编辑1:我尝试增加密集层的单位,但结果没有改变。

编辑 2:我尝试使用相同的数据训练一个简单的 CNN 模型。我也得到了 11.86% 的准确率。所以我不确定是什么问题。也许数据有问题?

I have some problems with the creation of a transformer model. Whatever I change in the parameters, I always get 11.86% validation accuracy which is not even changed If I train the model with only 1 input. The accuracy changes only if I change the size of the validation data. I tried to follow this guide. I have 500 audio and I extracted their 20 MFCC features. So let's say I have a (500, 20, 1) size of data. The labels are the emotions that belongs to these audios. Since I already have the data, I did not use any embedding or tokenizing. Here is the code right now:

class TransformerBlock(layers.Layer):
    def __init__(self, key_dim, num_heads, ff_dim, rate=0.1):
        super(TransformerBlock, self).__init__()
        self.att = layers.MultiHeadAttention(num_heads=num_heads, key_dim=key_dim)
        self.ffn = keras.Sequential(
            [layers.Dense(ff_dim, activation="relu"), layers.Dense(key_dim),]
        )
        self.layernorm1 = layers.LayerNormalization(epsilon=1e-6)
        self.layernorm2 = layers.LayerNormalization(epsilon=1e-6)
        self.dropout1 = layers.Dropout(rate)
        self.dropout2 = layers.Dropout(rate)

    def call(self, inputs, training):
        attn_output = self.att(inputs, inputs)
        attn_output = self.dropout1(attn_output, training=training)
        out1 = self.layernorm1(inputs + attn_output)
        ffn_output = self.ffn(out1)
        ffn_output = self.dropout2(ffn_output, training=training)
        return self.layernorm2(out1 + ffn_output)

key_dim = 2  # Embedding size for each token
num_heads = 2  # Number of attention heads
ff_dim = 32  # Hidden layer size in feed forward network inside transformer

inputs = layers.Input(shape=(20,1))
transformer_block = TransformerBlock(key_dim, num_heads, ff_dim)
x = transformer_block(inputs)
x = layers.GlobalAveragePooling1D()(x)
x = layers.Dropout(0.1)(x)
x = layers.Dense(20, activation="relu")(x)
x = layers.Dropout(0.1)(x)
outputs = layers.Dense(7, activation="softmax")(x)

model = keras.Model(inputs=inputs, outputs=outputs)
optimizer = tf.keras.optimizers.Adam(learning_rate = 0.001, beta_1=0.9, beta_2=0.999,epsilon=1e-7,amsgrad=False,name='Adam')
model.compile(optimizer=optimizer, loss="categorical_crossentropy", metrics=["accuracy"])

history = model.fit(
    x_train, y_train, batch_size=16, epochs=15, validation_data=(x_val, y_val)
)

This starts training but the results are like this:

30/30 [==============================] - 1s 17ms/step - loss: 1.9461 - accuracy: 0.1213 - val_loss: 1.9471 - val_accuracy: 0.1186
Epoch 2/15
30/30 [==============================] - 0s 11ms/step - loss: 1.9457 - accuracy: 0.1489 - val_loss: 1.9479 - val_accuracy: 0.1186
Epoch 3/15
30/30 [==============================] - 0s 11ms/step - loss: 1.9456 - accuracy: 0.1489 - val_loss: 1.9489 - val_accuracy: 0.1186
Epoch 4/15
30/30 [==============================] - 0s 11ms/step - loss: 1.9452 - accuracy: 0.1489 - val_loss: 1.9501 - val_accuracy: 0.1186
Epoch 5/15
30/30 [==============================] - 0s 11ms/step - loss: 1.9451 - accuracy: 0.1489 - val_loss: 1.9510 - val_accuracy: 0.1186
Epoch 6/15
30/30 [==============================] - 0s 11ms/step - loss: 1.9450 - accuracy: 0.1277 - val_loss: 1.9523 - val_accuracy: 0.1186

Val accuracy is 11.86% even I change any of the parameters and even I change the size of the train data. I think I made a mistake while creating the model but I could not find the problem. At first I tried normalization and one hot encoding but after I saw that even I use 1 data for the train, it gives 11.86% accuracy, I thought that this problem is completely independent from the data. There should be a problem in the creation of the model but I can not see it.

This is the model summary:

model summary

Edit 1: I tried increasing the unit of dense layers but the result did not change.

Edit 2: I tried to train a simple CNN model with the same data. I got exactly 11.86% accuracy with it too. So I am not sure what is the problem. Maybe the data has a problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文