Sigmoid激活输出层产生许多接近1值
:)
我的数据集约为16,000。 我正在使用TensorFlow训练模型,以使用基于卷积的体系结构对这些记录的MEL光谱图进行分类。
所使用的架构之一是下面描述的简单多层卷积。 预处理阶段包括:
- 提取MEL光谱图,然后将DB量表
- 段音频转换为1秒段(如果残留量长于250ms,则零或高斯噪声的PAD,否则会丢弃)
- Z得分的训练数据归一化 - 减少平均值以及通过std
预处理时的划分结果:
- 与上述
- z得分标准化相同的训练数据 - 减少均值(训练)的平均值(训练数据)(训练数据的)划分结果,
我了解到输出层用Sigmoid激活的概率为不想累积到1,但我得到了许多(8-10)的高预测(〜0.999)概率。有些正好是0.5。
当前测试集正确的分类率为〜84%,经过10倍的交叉验证测试,因此该网络似乎主要运行良好。
笔记: 1.我知道不同鸟类的发声中也有类似的特征,但是收到的概率似乎并不能正确反映它们 2。例如 - 自然噪声记录: 自然噪声:0.999 mallard -0.981
我试图理解这些结果的原因,如果数据与数据相关,则广泛标签(可能不是)或其他来源。
任何帮助将不胜感激! :)
编辑:我使用Sigmoid,因为所有类的概率都是必要的,而且我不需要它们积累到1。
def convnet1(input_shape, numClasses, activation='softmax'):
# Define the network
model = tf.keras.Sequential()
model.add(InputLayer(input_shape=input_shape))
# model.add(Augmentations1(p=0.5, freq_type='mel', max_aug=2))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 1)))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 1)))
model.add(Conv2D(128, (5, 5), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(256, (5, 5), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(Flatten())
# model.add(Dense(numClasses, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(numClasses, activation='sigmoid'))
model.compile(
loss='categorical_crossentropy',
metrics=['accuracy'],
optimizer=optimizers.Adam(learning_rate=0.001),
run_eagerly=False) # this parameter allows to debug and use regular functions inside layers: print(), save() etc..
return model
:)
I have a Datset of ~16,000 .wav recording from 70 bird species.
I'm training a model using tensorflow to classify the mel-spectrogram of these recordings using Convolution based architectures.
One of the architectures used is simple multi-layer convolutional described below.
The pre-processing phase include:
- extract mel-spectrograms and convert to dB Scale
- segment audio to 1-second segment (pad with zero Or gaussian noise if residual is longer than 250ms, discard otherwise)
- z-score normalization of training data - reduce mean and divide result by std
pre-processing while inference:
- same as described above
- z-score normalization BY training data - reduce mean (of training) and divide result by std (of training data)
I understand that the output layer's probabilities with sigmoid activation is not suppose to accumulate to 1, But I get many (8-10) very high prediction (~0.999) probabilities. and some is exactly 0.5.
The current test set correct classification rate is ~84%, tested with 10-fold cross validation, So it seems that the the network mostly operates well.
notes:
1.I understand there are similar features in the vocalization of different birds species, but the recieved probabilities doesn't seem to reflect them correctly
2. probabilities for example - a recording of natural noise:
Natural noise: 0.999
Mallard - 0.981
I'm trying to understand the reason for these results, if it's related the the data etc extensive mislabeling (probably not) or from another source.
Any help will be much appreciated! :)
EDIT: I use sigmoid because the probabilities of all classes are necessary, and I don't need them to accumulate to 1.
def convnet1(input_shape, numClasses, activation='softmax'):
# Define the network
model = tf.keras.Sequential()
model.add(InputLayer(input_shape=input_shape))
# model.add(Augmentations1(p=0.5, freq_type='mel', max_aug=2))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 1)))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 1)))
model.add(Conv2D(128, (5, 5), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(256, (5, 5), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(Flatten())
# model.add(Dense(numClasses, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(numClasses, activation='sigmoid'))
model.compile(
loss='categorical_crossentropy',
metrics=['accuracy'],
optimizer=optimizers.Adam(learning_rate=0.001),
run_eagerly=False) # this parameter allows to debug and use regular functions inside layers: print(), save() etc..
return model
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
对于将来的搜索 - 解决了此问题,并找到了原因(!)。
使用的初始批量尺寸为256或512。将批处理大小降低至16或32解决了问题,现在概率的差异是训练和测试集样品的预期 - 正确的标签非常高,对于非常低的标签,对于非常低的标签。其他课程。
For future searches - this problem was solved, and the reason was found(!).
The initial batch size that was used was 256 or 512. reducing the batch size to 16 or 32 SOLVED THE PROBLEM, and now the difference in probabilities are as expected for training AND test set samples - very high for the correct label and very low for other classes.