我目前正在研究音频分类任务,并使用 Yamnet,这是 tfhub 的预训练模型。我用它从音频中提取嵌入,然后我使用另一个由两个密集层组成的简单分类模型,第二个模型将yamnet 给出的嵌入并进行分类。
问题是 yamnet 给出的嵌入总是第三类具有最高值,并且它始终是预测类。
我遵循了这个教程: https:/ /blog.tensorflow.org/2021/03/transfer-learning-for-audio-data-with-yamnet.html
I am currently working on audio classification task and using Yamnet which is a pretrained model from tfhub.. I am using it to extract embeddings from audios and then i use another simple classification model composed of two dense layers, the second model takes as input the embeddings given by yamnet and does the classification.
The problem is that the embeddings given by yamnet are always in a way that the third class have the highest value and it is always the predicted class.
If anyone worked on such issue plz i need ur help and thanks in advance.
I followed this tuto : https://blog.tensorflow.org/2021/03/transfer-learning-for-audio-data-with-yamnet.html
Sounds like your data are not separated equally between each class. Your model overfits with the "third class" from your dataset. I would consider investigating the possibility of splitting the data for train, validation and testing using the stratified method so that every class is included during training/validation/testing.
Here is a resource of Stratified K fold: