班级的重量如何干预神经网络的反向传播方程？

发布于 2025-01-31 18:01:13 字数 3030 浏览 4 评论 0原文

我正在从头开始实现分类神经网络（除了numpy以外没有库）。我正在遵循本教程： http://neuralnetworksandworksandworksanddeeplearning.com/chap2.html 。但是，我正在处理一个不平衡的数据集。因此，我为数据集的每个类别/类计算A 类权重。我计算了带有公式的 i - th类别的类重量 w_i （从本文中获取： https://www.analyticsvidhya.com/blog/blog/2020/2020/10/improve-class-class-class-mombalance-class-class-class-weights/ ）：

w_i = n / (N * n_i)

其中：

n 是数据集中的样本总数；
n_i 是类别的样本 i
n 是类别的数量

现在我有这些班级权重，它们如何干预反向传播算法？我必须在返回公式中使用这些 w_i 系数以及如何

？知识 - center/documentation/modering-view/build-an-ai-model/class-weights“ rel =“ nofollow noreferrer”> https://peltarion.com/knowledge-center/documentatre/documentation/modeling-view-view/build-an -ai模型/班级重量）是，损失和错误只是乘以该样本所期望的类别的班级重量：

# y : expected output of the network
# activations : array of outputs at each layer, thus activations[-1]
#               is the output of the network
# zs : inputs of each layer (i.e. the input vector at a layer before
#      it goes through the activation function)

# y and activations[-1] are of shape : (number of categories, 1)

# Get the class weight of the expected category
class_id = np.argmax(y)
cw = class_weights[class_id]

# Loss (only used for metrics) : Mean Squared Error (MSE)
loss = 1 / y.shape[0] * np.sum((y - activations[-1]) ** 2)
# Apply class weight on the loss
loss *= cw

# (simplified) MSE cost derivative (error)
# Applying the class weight to the error
error = (activations[-1] - y) * self.dactivation(zs[-1])
# Applying the class weight to the error
error *= cw

# Gradients of the last layer
w_grads[-1] = np.matmul(error, activations[-2].transpose())
b_grads[-1] = error

但是，这会导致非常缓慢的训练，这是一个训练日志的提取物：

Epoch 0: 0.0% (0 / 21892) | loss : 0.1675
Epoch 1: 0.0% (4 / 21892) | loss : 0.1578
Epoch 2: 0.0% (1 / 21892) | loss : 0.1521
Epoch 3: 0.4% (88 / 21892) | loss : 0.1460
Epoch 4: 4.8% (1045 / 21892) | loss : 0.1352
Epoch 5: 6.9% (1514 / 21892) | loss : 0.1230
Epoch 6: 7.9% (1726 / 21892) | loss : 0.1134
Epoch 7: 7.9% (1740 / 21892) | loss : 0.1060
Epoch 8: 8.3% (1807 / 21892) | loss : 0.1006
Epoch 9: 8.6% (1893 / 21892) | loss : 0.0963
Epoch 10: 9.5% (2076 / 21892) | loss : 0.0927
Epoch 11: 10.2% (2228 / 21892) | loss : 0.0894
Epoch 12: 12.9% (2829 / 21892) | loss : 0.0865
Epoch 13: 15.9% (3470 / 21892) | loss : 0.0838
Epoch 14: 18.8% (4109 / 21892) | loss : 0.0815
Epoch 15: 26.0% (5701 / 21892) | loss : 0.0795

精度是在测试数据集上计算的，并且在第一个时期之后为 0％（不应该在50％左右？），然后非常缓慢地增加。批次大小为200个样本。在同一数据集上训练具有相同层，激活功能，损失和班级权重的KERAS模型的速度更快，更有效（在30个时期后，精度达到92％的精度）。

事先感谢您的帮助。

原文

I'm implementing a classification neural network from scratch (no libraries except numpy). I am following this tutorial : http://neuralnetworksanddeeplearning.com/chap2.html . However, I am dealing with an unbalanced dataset. Thus I compute a class weight for each category/class of the dataset. I compute the class weight w_i of the i-th category with the formula (got from this article : https://www.analyticsvidhya.com/blog/2020/10/improve-class-imbalance-class-weights/ ) :

w_i = n / (N * n_i)

where :

n is the total number of samples in the dataset;
n_i is the number of samples of category i
N is the number of categories

Now that I have these class weights, how do they intervene in the backpropagation algorithm ? Where exactly must I use these w_i coefficients in the backpropagation formulas and how ?

My first idea (that I got from this article : https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/class-weights ) is that the loss and the error simply get multiplied by the class weight of the category that was expected for this sample :

# y : expected output of the network
# activations : array of outputs at each layer, thus activations[-1]
#               is the output of the network
# zs : inputs of each layer (i.e. the input vector at a layer before
#      it goes through the activation function)

# y and activations[-1] are of shape : (number of categories, 1)

# Get the class weight of the expected category
class_id = np.argmax(y)
cw = class_weights[class_id]

# Loss (only used for metrics) : Mean Squared Error (MSE)
loss = 1 / y.shape[0] * np.sum((y - activations[-1]) ** 2)
# Apply class weight on the loss
loss *= cw

# (simplified) MSE cost derivative (error)
# Applying the class weight to the error
error = (activations[-1] - y) * self.dactivation(zs[-1])
# Applying the class weight to the error
error *= cw

# Gradients of the last layer
w_grads[-1] = np.matmul(error, activations[-2].transpose())
b_grads[-1] = error

However, this results in a very slow training, here is an extract of the training logs :

Epoch 0: 0.0% (0 / 21892) | loss : 0.1675
Epoch 1: 0.0% (4 / 21892) | loss : 0.1578
Epoch 2: 0.0% (1 / 21892) | loss : 0.1521
Epoch 3: 0.4% (88 / 21892) | loss : 0.1460
Epoch 4: 4.8% (1045 / 21892) | loss : 0.1352
Epoch 5: 6.9% (1514 / 21892) | loss : 0.1230
Epoch 6: 7.9% (1726 / 21892) | loss : 0.1134
Epoch 7: 7.9% (1740 / 21892) | loss : 0.1060
Epoch 8: 8.3% (1807 / 21892) | loss : 0.1006
Epoch 9: 8.6% (1893 / 21892) | loss : 0.0963
Epoch 10: 9.5% (2076 / 21892) | loss : 0.0927
Epoch 11: 10.2% (2228 / 21892) | loss : 0.0894
Epoch 12: 12.9% (2829 / 21892) | loss : 0.0865
Epoch 13: 15.9% (3470 / 21892) | loss : 0.0838
Epoch 14: 18.8% (4109 / 21892) | loss : 0.0815
Epoch 15: 26.0% (5701 / 21892) | loss : 0.0795

The accuracy is calculated on a test dataset, and is at 0 % after the first epoch (shouldn't it be at around 50% ?) and then very slowly increases. The batch size is 200 samples. Training a Keras model with the same layers, activation functions, loss and class weights on the same dataset is way faster and more efficient (reaching up to 92% accuracy after 30 epochs).

Thanks in advance for your help.

分享到QQ

分享到微博