Sigmoid激活输出层产生许多接近1值

发布于 2025-02-08 12:26:16 字数 1954 浏览 1 评论 0原文

我的数据集约为16,000。我正在使用TensorFlow训练模型，以使用基于卷积的体系结构对这些记录的MEL光谱图进行分类。

所使用的架构之一是下面描述的简单多层卷积。预处理阶段包括：

提取MEL光谱图，然后将DB量表
段音频转换为1秒段（如果残留量长于250ms，则零或高斯噪声的PAD，否则会丢弃）
Z得分的训练数据归一化 - 减少平均值以及通过std

预处理时的划分结果：

与上述
z得分标准化相同的训练数据 - 减少均值（训练）的平均值（训练数据）（训练数据的）划分结果，

我了解到输出层用Sigmoid激活的概率为不想累积到1，但我得到了许多（8-10）的高预测（〜0.999）概率。有些正好是0.5。

当前测试集正确的分类率为〜84％，经过10倍的交叉验证测试，因此该网络似乎主要运行良好。

笔记： 1.我知道不同鸟类的发声中也有类似的特征，但是收到的概率似乎并不能正确反映它们 2。例如 - 自然噪声记录：自然噪声：0.999 mallard -0.981

我试图理解这些结果的原因，如果数据与数据相关，则广泛标签（可能不是）或其他来源。

任何帮助将不胜感激！ :)

编辑：我使用Sigmoid，因为所有类的概率都是必要的，而且我不需要它们积累到1。

def convnet1(input_shape, numClasses, activation='softmax'):

    # Define the network
    model = tf.keras.Sequential()
    model.add(InputLayer(input_shape=input_shape))
    # model.add(Augmentations1(p=0.5, freq_type='mel', max_aug=2))

    model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
    model.add(BatchNormalization())
    model.add(MaxPooling2D(pool_size=(2, 1)))
    model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
    model.add(BatchNormalization())
    model.add(MaxPooling2D(pool_size=(2, 1)))
    model.add(Conv2D(128, (5, 5), activation='relu', padding='same'))
    model.add(BatchNormalization())
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(256, (5, 5), activation='relu', padding='same'))
    model.add(BatchNormalization())

    model.add(Flatten())
    # model.add(Dense(numClasses, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(numClasses, activation='sigmoid'))

    model.compile(
        loss='categorical_crossentropy',
        metrics=['accuracy'],
        optimizer=optimizers.Adam(learning_rate=0.001),
        run_eagerly=False)  # this parameter allows to debug and use regular functions inside layers: print(), save() etc..
    return model

原文

I have a Datset of ~16,000 .wav recording from 70 bird species.
I'm training a model using tensorflow to classify the mel-spectrogram of these recordings using Convolution based architectures.

One of the architectures used is simple multi-layer convolutional described below.
The pre-processing phase include:

extract mel-spectrograms and convert to dB Scale
segment audio to 1-second segment (pad with zero Or gaussian noise if residual is longer than 250ms, discard otherwise)
z-score normalization of training data - reduce mean and divide result by std

pre-processing while inference:

same as described above
z-score normalization BY training data - reduce mean (of training) and divide result by std (of training data)

I understand that the output layer's probabilities with sigmoid activation is not suppose to accumulate to 1, But I get many (8-10) very high prediction (~0.999) probabilities. and some is exactly 0.5.

The current test set correct classification rate is ~84%, tested with 10-fold cross validation, So it seems that the the network mostly operates well.

notes:
1.I understand there are similar features in the vocalization of different birds species, but the recieved probabilities doesn't seem to reflect them correctly
2. probabilities for example - a recording of natural noise:
Natural noise: 0.999
Mallard - 0.981

I'm trying to understand the reason for these results, if it's related the the data etc extensive mislabeling (probably not) or from another source.

Any help will be much appreciated! :)

EDIT: I use sigmoid because the probabilities of all classes are necessary, and I don't need them to accumulate to 1.

def convnet1(input_shape, numClasses, activation='softmax'):

    # Define the network
    model = tf.keras.Sequential()
    model.add(InputLayer(input_shape=input_shape))
    # model.add(Augmentations1(p=0.5, freq_type='mel', max_aug=2))

    model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
    model.add(BatchNormalization())
    model.add(MaxPooling2D(pool_size=(2, 1)))
    model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
    model.add(BatchNormalization())
    model.add(MaxPooling2D(pool_size=(2, 1)))
    model.add(Conv2D(128, (5, 5), activation='relu', padding='same'))
    model.add(BatchNormalization())
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(256, (5, 5), activation='relu', padding='same'))
    model.add(BatchNormalization())

    model.add(Flatten())
    # model.add(Dense(numClasses, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(numClasses, activation='sigmoid'))

    model.compile(
        loss='categorical_crossentropy',
        metrics=['accuracy'],
        optimizer=optimizers.Adam(learning_rate=0.001),
        run_eagerly=False)  # this parameter allows to debug and use regular functions inside layers: print(), save() etc..
    return model

分享到QQ

分享到微博