班级的重量如何干预神经网络的反向传播方程?
我正在从头开始实现分类神经网络(除了numpy以外没有库)。我正在遵循本教程: http://neuralnetworksandworksandworksanddeeplearning.com/chap2.html 。但是,我正在处理一个不平衡的数据集。因此,我为数据集的每个类别/类计算A 类权重。我计算了带有公式的 i - th类别的类重量 w_i (从本文中获取: https://www.analyticsvidhya.com/blog/blog/2020/2020/10/improve-class-class-class-mombalance-class-class-class-weights/ ):
w_i = n / (N * n_i)
其中:
- n 是数据集中的样本总数;
- n_i 是类别的样本 i
- n 是类别的数量
现在我有这些班级权重,它们如何干预反向传播算法?我必须在返回公式中使用这些 w_i 系数以及如何
?知识 - center/documentation/modering-view/build-an-ai-model/class-weights“ rel =“ nofollow noreferrer”> https://peltarion.com/knowledge-center/documentatre/documentation/modeling-view-view/build-an -ai模型/班级重量)是,损失和错误只是乘以该样本所期望的类别的班级重量:
# y : expected output of the network
# activations : array of outputs at each layer, thus activations[-1]
# is the output of the network
# zs : inputs of each layer (i.e. the input vector at a layer before
# it goes through the activation function)
# y and activations[-1] are of shape : (number of categories, 1)
# Get the class weight of the expected category
class_id = np.argmax(y)
cw = class_weights[class_id]
# Loss (only used for metrics) : Mean Squared Error (MSE)
loss = 1 / y.shape[0] * np.sum((y - activations[-1]) ** 2)
# Apply class weight on the loss
loss *= cw
# (simplified) MSE cost derivative (error)
# Applying the class weight to the error
error = (activations[-1] - y) * self.dactivation(zs[-1])
# Applying the class weight to the error
error *= cw
# Gradients of the last layer
w_grads[-1] = np.matmul(error, activations[-2].transpose())
b_grads[-1] = error
但是,这会导致非常缓慢的训练,这是一个训练日志的提取物:
Epoch 0: 0.0% (0 / 21892) | loss : 0.1675
Epoch 1: 0.0% (4 / 21892) | loss : 0.1578
Epoch 2: 0.0% (1 / 21892) | loss : 0.1521
Epoch 3: 0.4% (88 / 21892) | loss : 0.1460
Epoch 4: 4.8% (1045 / 21892) | loss : 0.1352
Epoch 5: 6.9% (1514 / 21892) | loss : 0.1230
Epoch 6: 7.9% (1726 / 21892) | loss : 0.1134
Epoch 7: 7.9% (1740 / 21892) | loss : 0.1060
Epoch 8: 8.3% (1807 / 21892) | loss : 0.1006
Epoch 9: 8.6% (1893 / 21892) | loss : 0.0963
Epoch 10: 9.5% (2076 / 21892) | loss : 0.0927
Epoch 11: 10.2% (2228 / 21892) | loss : 0.0894
Epoch 12: 12.9% (2829 / 21892) | loss : 0.0865
Epoch 13: 15.9% (3470 / 21892) | loss : 0.0838
Epoch 14: 18.8% (4109 / 21892) | loss : 0.0815
Epoch 15: 26.0% (5701 / 21892) | loss : 0.0795
精度是在测试数据集上计算的,并且在第一个时期之后为 0%(不应该在50%左右?),然后非常缓慢地增加。批次大小为200个样本。在同一数据集上训练具有相同层,激活功能,损失和班级权重的KERAS模型的速度更快,更有效(在30个时期后,精度达到92%的精度)。
事先感谢您的帮助。
I'm implementing a classification neural network from scratch (no libraries except numpy). I am following this tutorial : http://neuralnetworksanddeeplearning.com/chap2.html . However, I am dealing with an unbalanced dataset. Thus I compute a class weight for each category/class of the dataset. I compute the class weight w_i of the i-th category with the formula (got from this article : https://www.analyticsvidhya.com/blog/2020/10/improve-class-imbalance-class-weights/ ) :
w_i = n / (N * n_i)
where :
- n is the total number of samples in the dataset;
- n_i is the number of samples of category i
- N is the number of categories
Now that I have these class weights, how do they intervene in the backpropagation algorithm ? Where exactly must I use these w_i coefficients in the backpropagation formulas and how ?
My first idea (that I got from this article : https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/class-weights ) is that the loss and the error simply get multiplied by the class weight of the category that was expected for this sample :
# y : expected output of the network
# activations : array of outputs at each layer, thus activations[-1]
# is the output of the network
# zs : inputs of each layer (i.e. the input vector at a layer before
# it goes through the activation function)
# y and activations[-1] are of shape : (number of categories, 1)
# Get the class weight of the expected category
class_id = np.argmax(y)
cw = class_weights[class_id]
# Loss (only used for metrics) : Mean Squared Error (MSE)
loss = 1 / y.shape[0] * np.sum((y - activations[-1]) ** 2)
# Apply class weight on the loss
loss *= cw
# (simplified) MSE cost derivative (error)
# Applying the class weight to the error
error = (activations[-1] - y) * self.dactivation(zs[-1])
# Applying the class weight to the error
error *= cw
# Gradients of the last layer
w_grads[-1] = np.matmul(error, activations[-2].transpose())
b_grads[-1] = error
However, this results in a very slow training, here is an extract of the training logs :
Epoch 0: 0.0% (0 / 21892) | loss : 0.1675
Epoch 1: 0.0% (4 / 21892) | loss : 0.1578
Epoch 2: 0.0% (1 / 21892) | loss : 0.1521
Epoch 3: 0.4% (88 / 21892) | loss : 0.1460
Epoch 4: 4.8% (1045 / 21892) | loss : 0.1352
Epoch 5: 6.9% (1514 / 21892) | loss : 0.1230
Epoch 6: 7.9% (1726 / 21892) | loss : 0.1134
Epoch 7: 7.9% (1740 / 21892) | loss : 0.1060
Epoch 8: 8.3% (1807 / 21892) | loss : 0.1006
Epoch 9: 8.6% (1893 / 21892) | loss : 0.0963
Epoch 10: 9.5% (2076 / 21892) | loss : 0.0927
Epoch 11: 10.2% (2228 / 21892) | loss : 0.0894
Epoch 12: 12.9% (2829 / 21892) | loss : 0.0865
Epoch 13: 15.9% (3470 / 21892) | loss : 0.0838
Epoch 14: 18.8% (4109 / 21892) | loss : 0.0815
Epoch 15: 26.0% (5701 / 21892) | loss : 0.0795
The accuracy is calculated on a test dataset, and is at 0 % after the first epoch (shouldn't it be at around 50% ?) and then very slowly increases. The batch size is 200 samples. Training a Keras model with the same layers, activation functions, loss and class weights on the same dataset is way faster and more efficient (reaching up to 92% accuracy after 30 epochs).
Thanks in advance for your help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论