当前位置：文江博客话题详情

如何更新神经网络反向传播中的偏差？

发布于 2024-09-25 04:55:41 字数 105 浏览 5 评论 0原文

有人可以向我解释如何在整个反向传播过程中更新偏差吗？

我读了很多书，但找不到偏见更新！

我知道偏差是额外输入 1，并附加一个权重（对于每个神经元）。一定有一个公式。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

泪之魂 2024-10-02 04:55:41

遵循 Rojas 1996，第 7 章的表示法，反向传播计算误差函数 E 的偏导数（又称为成本，又称为损失）

∂E/∂w[i,j] = delta[j] * o[i]

，其中 w[i,j] 是神经元 i 之间连接的权重code> 和 j，j 在网络中比 i 高一层，o[i] 是i 的输出（激活）（在“输入层”的情况下，这只是所考虑的训练样本中特征 i 的值）。如何确定delta在任何教科书上都有给出，并且取决于激活函数，所以这里不再重复。

这些值随后可用于权重更新，例如

// update rule for vanilla online gradient descent
w[i,j] -= gamma * o[i] * delta[j]

，其中gamma 是学习率。

偏差权重的规则非常相似，只是没有来自前一层的输入。相反，偏差（概念上）是由固定激活值为 1 的神经元的输入引起的。因此，偏差权重的更新规则是

bias[j] -= gamma_bias * 1 * delta[j]

其中 bias[j] 是神经元 < 上的偏差权重code>j，显然可以省略与 1 的乘法，并且 gamma_bias 可以设置为 gamma 或其他值。如果我没记错的话，较低的值是首选，尽管我不确定其理论依据。

Following the notation of Rojas 1996, chapter 7, backpropagation computes partial derivatives of the error function E (aka cost, aka loss)

∂E/∂w[i,j] = delta[j] * o[i]

where w[i,j] is the weight of the connection between neurons i and j, j being one layer higher in the network than i, and o[i] is the output (activation) of i (in the case of the "input layer", that's just the value of feature i in the training sample under consideration). How to determine delta is given in any textbook and depends on the activation function, so I won't repeat it here.

These values can then be used in weight updates, e.g.

// update rule for vanilla online gradient descent
w[i,j] -= gamma * o[i] * delta[j]

where gamma is the learning rate.

The rule for bias weights is very similar, except that there's no input from a previous layer. Instead, bias is (conceptually) caused by input from a neuron with a fixed activation of 1. So, the update rule for bias weights is

bias[j] -= gamma_bias * 1 * delta[j]

where bias[j] is the weight of the bias on neuron j, the multiplication with 1 can obviously be omitted, and gamma_bias may be set to gamma or to a different value. If I recall correctly, lower values are preferred, though I'm not sure about the theoretical justification of that.

回复收藏 0 原文

南汐寒笙箫 2024-10-02 04:55:41

您更改每个单独权重和偏差的量将是成本函数相对于每个单独权重和每个单独偏差的偏导数。

∂C/∂(index of bias in network)

由于您的成本函数可能并不明确依赖于各个权重和值（例如，成本可能等于（网络输出 - 预期输出）^2），因此您需要将每个权重和偏差的偏导数与您想要的值相关联。知道，即神经元的激活值（输出）。这是执行此操作的一个很好的指南：

https://medium.com/@erikhallstrm /backpropagation-from-the-beginning-77356edf427d

本指南清楚地说明了如何做这些事情，但有时可能缺乏解释。当我阅读上面链接的指南时，我发现阅读本书的第 1 章和第 2 章非常有帮助：

http://neuralnetworksanddeeplearning .com/chap1.html
（为回答您的问题提供基本背景）

http://neuralnetworksanddeeplearning.com/chap2.html
（回答你的问题）

基本上，偏差的更新方式与权重的更新方式相同：基于多维点的成本函数的梯度来确定变化。

将您的网络试图解决的问题视为多维丘陵和山谷（梯度）的景观。此景观以图形方式表示您的成本如何随着权重和偏差的变化而变化。神经网络的目标是到达该景观的最低点，从而找到最小的成本并最小化错误。如果您将网络想象为试图到达这些梯度底部（即梯度下降）的旅行者，那么您将改变每个权重（和偏差）的量与斜率（函数的梯度）有关旅行者目前正在爬下。旅行者的准确位置由多维坐标点（weight1、weight2、weight3、...weight_n）给出，其中偏差可以被认为是另一种权重。将网络的权重/偏差视为网络成本函数的变量，清楚地表明必须使用 ∂C/∂（网络偏差指数）。

The amount you change each individual weight and bias will be the partial derivative of your cost function in relation to each individual weight and each individual bias.

∂C/∂(index of bias in network)

Since your cost function probably doesn't explicitly depend on individual weights and values (Cost might equal (network output - expected output)^2, for example), you'll need to relate the partial derivatives of each weight and bias to something you know, i.e. the activation values (outputs) of neurons. Here's a great guide to doing this:

https://medium.com/@erikhallstrm/backpropagation-from-the-beginning-77356edf427d

This guide states how to do these things clearly, but can sometimes be lacking on explanation. I found it very helpful to read chapters 1 and 2 of this book as I read the guide linked above:

http://neuralnetworksanddeeplearning.com/chap1.html
(provides essential background for the answer to your question)

http://neuralnetworksanddeeplearning.com/chap2.html
(answers your question)

Basically, biases are updated in the same way that weights are updated: a change is determined based on the gradient of the cost function at a multi-dimensional point.

Think of the problem your network is trying to solve as being a landscape of multi-dimensional hills and valleys (gradients). This landscape is a graphical representation of how your cost changes with changing weights and biases. The goal of a neural network is to reach the lowest point in this landscape, thereby finding the smallest cost and minimizing error. If you imagine your network as a traveler trying to reach the bottom of these gradients (i.e. Gradient Descent), then the amount you will change each weight (and bias) by is related to the the slope of the incline (gradient of the function) that the traveler is currently climbing down. The exact location of the traveler is given by a multi-dimensional coordinate point (weight1, weight2, weight3, ... weight_n), where the bias can be thought of as another kind of weight. Thinking of the weights/biases of a network as the variables for the network's cost function make it clear that ∂C/∂(index of bias in network) must be used.

回复收藏 0 原文