班级的重量如何干预神经网络的反向传播方程?

发布于 2025-01-31 18:01:13 字数 3030 浏览 4 评论 0原文

我正在从头开始实现分类神经网络(除了numpy以外没有库)。我正在遵循本教程: http://neuralnetworksandworksandworksanddeeplearning.com/chap2.html 。但是,我正在处理一个不平衡的数据集。因此,我为数据集的每个类别/类计算A 类权重。我计算了带有公式的 i - th类别的类重量 w_i (从本文中获取: https://www.analyticsvidhya.com/blog/blog/2020/2020/10/improve-class-class-class-mombalance-class-class-class-weights/ ):

w_i = n / (N * n_i)

其中:

  • n 是数据集中的样本总数;
  • n_i 是类别的样本 i
  • n 是类别的数量

现在我有这些班级权重,它们如何干预反向传播算法?我必须在返回公式中使用这些 w_i 系数以及如何

?知识 - center/documentation/modering-view/build-an-ai-model/class-weights“ rel =“ nofollow noreferrer”> https://peltarion.com/knowledge-center/documentatre/documentation/modeling-view-view/build-an -ai模型/班级重量)是,损失和错误只是乘以该样本所期望的类别的班级重量:

# y : expected output of the network
# activations : array of outputs at each layer, thus activations[-1]
#               is the output of the network
# zs : inputs of each layer (i.e. the input vector at a layer before
#      it goes through the activation function)

# y and activations[-1] are of shape : (number of categories, 1)

# Get the class weight of the expected category
class_id = np.argmax(y)
cw = class_weights[class_id]

# Loss (only used for metrics) : Mean Squared Error (MSE)
loss = 1 / y.shape[0] * np.sum((y - activations[-1]) ** 2)
# Apply class weight on the loss
loss *= cw

# (simplified) MSE cost derivative (error)
# Applying the class weight to the error
error = (activations[-1] - y) * self.dactivation(zs[-1])
# Applying the class weight to the error
error *= cw

# Gradients of the last layer
w_grads[-1] = np.matmul(error, activations[-2].transpose())
b_grads[-1] = error

但是,这会导致非常缓慢的训练,这是一个训练日志的提取物:

Epoch 0: 0.0% (0 / 21892) | loss : 0.1675
Epoch 1: 0.0% (4 / 21892) | loss : 0.1578
Epoch 2: 0.0% (1 / 21892) | loss : 0.1521
Epoch 3: 0.4% (88 / 21892) | loss : 0.1460
Epoch 4: 4.8% (1045 / 21892) | loss : 0.1352
Epoch 5: 6.9% (1514 / 21892) | loss : 0.1230
Epoch 6: 7.9% (1726 / 21892) | loss : 0.1134
Epoch 7: 7.9% (1740 / 21892) | loss : 0.1060
Epoch 8: 8.3% (1807 / 21892) | loss : 0.1006
Epoch 9: 8.6% (1893 / 21892) | loss : 0.0963
Epoch 10: 9.5% (2076 / 21892) | loss : 0.0927
Epoch 11: 10.2% (2228 / 21892) | loss : 0.0894
Epoch 12: 12.9% (2829 / 21892) | loss : 0.0865
Epoch 13: 15.9% (3470 / 21892) | loss : 0.0838
Epoch 14: 18.8% (4109 / 21892) | loss : 0.0815
Epoch 15: 26.0% (5701 / 21892) | loss : 0.0795

精度是在测试数据集上计算的,并且在第一个时期之后为 0%(不应该在50%左右?),然后非常缓慢地增加。批次大小为200个样本。在同一数据集上训练具有相同层,激活功能,损失和班级权重的KERAS模型的速度更快,更有效(在30个时期后,精度达到92%的精度)。

事先感谢您的帮助。

I'm implementing a classification neural network from scratch (no libraries except numpy). I am following this tutorial : http://neuralnetworksanddeeplearning.com/chap2.html . However, I am dealing with an unbalanced dataset. Thus I compute a class weight for each category/class of the dataset. I compute the class weight w_i of the i-th category with the formula (got from this article : https://www.analyticsvidhya.com/blog/2020/10/improve-class-imbalance-class-weights/ ) :

w_i = n / (N * n_i)

where :

  • n is the total number of samples in the dataset;
  • n_i is the number of samples of category i
  • N is the number of categories

Now that I have these class weights, how do they intervene in the backpropagation algorithm ? Where exactly must I use these w_i coefficients in the backpropagation formulas and how ?

My first idea (that I got from this article : https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/class-weights ) is that the loss and the error simply get multiplied by the class weight of the category that was expected for this sample :

# y : expected output of the network
# activations : array of outputs at each layer, thus activations[-1]
#               is the output of the network
# zs : inputs of each layer (i.e. the input vector at a layer before
#      it goes through the activation function)

# y and activations[-1] are of shape : (number of categories, 1)

# Get the class weight of the expected category
class_id = np.argmax(y)
cw = class_weights[class_id]

# Loss (only used for metrics) : Mean Squared Error (MSE)
loss = 1 / y.shape[0] * np.sum((y - activations[-1]) ** 2)
# Apply class weight on the loss
loss *= cw

# (simplified) MSE cost derivative (error)
# Applying the class weight to the error
error = (activations[-1] - y) * self.dactivation(zs[-1])
# Applying the class weight to the error
error *= cw

# Gradients of the last layer
w_grads[-1] = np.matmul(error, activations[-2].transpose())
b_grads[-1] = error

However, this results in a very slow training, here is an extract of the training logs :

Epoch 0: 0.0% (0 / 21892) | loss : 0.1675
Epoch 1: 0.0% (4 / 21892) | loss : 0.1578
Epoch 2: 0.0% (1 / 21892) | loss : 0.1521
Epoch 3: 0.4% (88 / 21892) | loss : 0.1460
Epoch 4: 4.8% (1045 / 21892) | loss : 0.1352
Epoch 5: 6.9% (1514 / 21892) | loss : 0.1230
Epoch 6: 7.9% (1726 / 21892) | loss : 0.1134
Epoch 7: 7.9% (1740 / 21892) | loss : 0.1060
Epoch 8: 8.3% (1807 / 21892) | loss : 0.1006
Epoch 9: 8.6% (1893 / 21892) | loss : 0.0963
Epoch 10: 9.5% (2076 / 21892) | loss : 0.0927
Epoch 11: 10.2% (2228 / 21892) | loss : 0.0894
Epoch 12: 12.9% (2829 / 21892) | loss : 0.0865
Epoch 13: 15.9% (3470 / 21892) | loss : 0.0838
Epoch 14: 18.8% (4109 / 21892) | loss : 0.0815
Epoch 15: 26.0% (5701 / 21892) | loss : 0.0795

The accuracy is calculated on a test dataset, and is at 0 % after the first epoch (shouldn't it be at around 50% ?) and then very slowly increases. The batch size is 200 samples. Training a Keras model with the same layers, activation functions, loss and class weights on the same dataset is way faster and more efficient (reaching up to 92% accuracy after 30 epochs).

Thanks in advance for your help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文