如何更新神经网络反向传播中的偏差?

发布于 2024-09-25 04:55:41 字数 105 浏览 5 评论 0原文

有人可以向我解释如何在整个反向传播过程中更新偏差吗?

我读了很多书,但找不到偏见更新!

我知道偏差是额外输入 1,并附加一个权重(对于每个神经元)。一定有一个公式。

Could someone please explain to me how to update the bias throughout backpropagation?

I've read quite a few books, but can't find bias updating!

I understand that bias is an extra input of 1 with a weight attached to it (for each neuron). There must be a formula.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

泪之魂 2024-10-02 04:55:41

遵循 Rojas 1996,第 7 章 的表示法,反向传播计算误差函数 E 的偏导数(又称为成本,又称为损失)

∂E/∂w[i,j] = delta[j] * o[i]

,其中 w[i,j] 是神经元 i 之间连接的权重code> 和 jj 在网络中比 i 高一层,o[i]i 的输出(激活)(在“输入层”的情况下,这只是所考虑的训练样本中特征 i 的值)。如何确定delta在任何教科书上都有给出,并且取决于激活函数,所以这里不再重复。

这些值随后可用于权重更新,例如

// update rule for vanilla online gradient descent
w[i,j] -= gamma * o[i] * delta[j]

,其中gamma 是学习率。

偏差权重的规则非常相似,只是没有来自前一层的输入。相反,偏差(概念上)是由固定激活值为 1 的神经元的输入引起的。因此,偏差权重的更新规则是

bias[j] -= gamma_bias * 1 * delta[j]

其中 bias[j] 是神经元 < 上的偏差权重code>j,显然可以省略与 1 的乘法,并且 gamma_bias 可以设置为 gamma 或其他值。如果我没记错的话,较低的值是首选,尽管我不确定其理论依据。

Following the notation of Rojas 1996, chapter 7, backpropagation computes partial derivatives of the error function E (aka cost, aka loss)

∂E/∂w[i,j] = delta[j] * o[i]

where w[i,j] is the weight of the connection between neurons i and j, j being one layer higher in the network than i, and o[i] is the output (activation) of i (in the case of the "input layer", that's just the value of feature i in the training sample under consideration). How to determine delta is given in any textbook and depends on the activation function, so I won't repeat it here.

These values can then be used in weight updates, e.g.

// update rule for vanilla online gradient descent
w[i,j] -= gamma * o[i] * delta[j]

where gamma is the learning rate.

The rule for bias weights is very similar, except that there's no input from a previous layer. Instead, bias is (conceptually) caused by input from a neuron with a fixed activation of 1. So, the update rule for bias weights is

bias[j] -= gamma_bias * 1 * delta[j]

where bias[j] is the weight of the bias on neuron j, the multiplication with 1 can obviously be omitted, and gamma_bias may be set to gamma or to a different value. If I recall correctly, lower values are preferred, though I'm not sure about the theoretical justification of that.

南汐寒笙箫 2024-10-02 04:55:41

您更改每个单独权重和偏差的量将是成本函数相对于每个单独权重和每个单独偏差的偏导数。

∂C/∂(index of bias in network)

由于您的成本函数可能并不明确依赖于各个权重和值(例如,成本可能等于(网络输出 - 预期输出)^2),因此您需要将每个权重和偏差的偏导数与您想要的值相关联。知道,即神经元的激活值(输出)。这是执行此操作的一个很好的指南:

https://medium.com/@erikhallstrm /backpropagation-from-the-beginning-77356edf427d

本指南清楚地说明了如何做这些事情,但有时可能缺乏解释。当我阅读上面链接的指南时,我发现阅读本书的第 1 章和第 2 章非常有帮助:

http://neuralnetworksanddeeplearning .com/chap1.html
(为回答您的问题提供基本背景)

http://neuralnetworksanddeeplearning.com/chap2.html
(回答你的问题)

基本上,偏差的更新方式与权重的更新方式相同:基于多维点的成本函数的梯度来确定变化。

将您的网络试图解决的问题视为多维丘陵和山谷(梯度)的景观。此景观以图形方式表示您的成本如何随着权重和偏差的变化而变化。神经网络的目标是到达该景观的最低点,从而找到最小的成本并最小化错误。如果您将网络想象为试图到达这些梯度底部(即梯度下降)的旅行者,那么您将改变每个权重(和偏差)的量与斜率(函数的梯度)有关旅行者目前正在爬下。旅行者的准确位置由多维坐标点(weight1、weight2、weight3、...weight_n)给出,其中偏差可以被认为是另一种权重。将网络的权重/偏差视为网络成本函数的变量,清楚地表明必须使用 ∂C/∂(网络偏差指数)。

The amount you change each individual weight and bias will be the partial derivative of your cost function in relation to each individual weight and each individual bias.

∂C/∂(index of bias in network)

Since your cost function probably doesn't explicitly depend on individual weights and values (Cost might equal (network output - expected output)^2, for example), you'll need to relate the partial derivatives of each weight and bias to something you know, i.e. the activation values (outputs) of neurons. Here's a great guide to doing this:

https://medium.com/@erikhallstrm/backpropagation-from-the-beginning-77356edf427d

This guide states how to do these things clearly, but can sometimes be lacking on explanation. I found it very helpful to read chapters 1 and 2 of this book as I read the guide linked above:

http://neuralnetworksanddeeplearning.com/chap1.html
(provides essential background for the answer to your question)

http://neuralnetworksanddeeplearning.com/chap2.html
(answers your question)

Basically, biases are updated in the same way that weights are updated: a change is determined based on the gradient of the cost function at a multi-dimensional point.

Think of the problem your network is trying to solve as being a landscape of multi-dimensional hills and valleys (gradients). This landscape is a graphical representation of how your cost changes with changing weights and biases. The goal of a neural network is to reach the lowest point in this landscape, thereby finding the smallest cost and minimizing error. If you imagine your network as a traveler trying to reach the bottom of these gradients (i.e. Gradient Descent), then the amount you will change each weight (and bias) by is related to the the slope of the incline (gradient of the function) that the traveler is currently climbing down. The exact location of the traveler is given by a multi-dimensional coordinate point (weight1, weight2, weight3, ... weight_n), where the bias can be thought of as another kind of weight. Thinking of the weights/biases of a network as the variables for the network's cost function make it clear that ∂C/∂(index of bias in network) must be used.

舂唻埖巳落 2024-10-02 04:55:41

我理解bias的作用就是对
输入值。下面是神经元内部发生的事情。当然激活函数
将产生最终输出,但为了清晰起见,将其省略。

  • O = W1 I1 + W2 I2 + W3 I3

在真实的神经元中,突触处已经发生了一些事情,输入数据根据样本的平均值进行级别调整,并根据样本的偏差进行缩放。因此,输入数据被标准化,并且具有相同的权重,它们将产生相同的效果。标准化In是根据原始数据in计算的(n是索引)。

  • Bn = 平均值(in); Sn = 1/stdev((in); In= (in+Bn >)Sn

但是,这没有必要单独执行,因为当您用 in< 替换 In 时,神经元权重和偏差可以执行相同的功能。 /em>,你得到新的公式

  • O = w1 i1 + w2 i2 + w3 i3+ wbs

最后一个wbs是偏差和新的权重wn以及

  • wbs = W1 B1 S1 + W2 B2 S2 + W3 B3 S3
  • wn =W1 (in+Bn) Sn

因此存在偏差,并且它将/应该通过反向传播自动调整

I understand that the function of bias is to make level adjust of the
input values. Below is what happens inside the neuron. The activation function of course
will make the final output, but it is left out for clarity.

  • O = W1 I1 + W2 I2 + W3 I3

In real neuron something happens already at synapses, the input data is level adjusted with average of samples and scaled with deviation of samples. Thus the input data is normalized and with equal weights they will make the same effect. The normalized In is calculated from raw data in (n is the index).

  • Bn = average(in); Sn = 1/stdev((in); In= (in+Bn)Sn

However this is not necessary to be performed separately, because the neuron weights and bias can do the same function. When you subsitute In with the in, you get new formula

  • O = w1 i1 + w2 i2 + w3 i3+ wbs

The last wbs is the bias and new weights wn as well

  • wbs = W1 B1 S1 + W2 B2 S2 + W3 B3 S3
  • wn =W1 (in+Bn) Sn

So there exists a bias and it will/should be adjusted automagically with the backpropagation

惜醉颜 2024-10-02 04:55:41

特定神经元的偏差+=误差*学习率

误差代表下一层输入神经元的误差

Bias for specific neuron += error * learning rate

error is representing the error of the input neurons in the next layer

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文