反向传播训练算法如何工作?
我一直在尝试了解反向传播如何与神经网络一起工作,但尚未从技术性较低的方面找到一个很好的解释。
反向传播如何工作?它如何从提供的训练数据集中学习?我将不得不对此进行编码,但在那之前我需要对它有更深入的了解。
I've been trying to learn how back-propagation works with neural networks, but yet to find a good explanation from a less technical aspect.
How does back-propagation work? How does it learn from a training dataset provided? I will have to code this, but until then I need to gain a stronger understanding of it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
反向传播的工作逻辑与前馈的逻辑非常相似。前馈。区别在于数据流的方向。在前馈步骤中,您可以从中观察到输入和输出。您可以向前传播这些值来训练前面的神经元。
在反向传播步骤中,除了输出层中的错误之外,您无法知道每个神经元中发生的错误。计算输出节点的误差非常简单 - 您可以获取神经元的输出与训练集中该实例的实际输出之间的差异。隐藏层中的神经元必须修复它们的错误。因此,您必须将错误值传递回给他们。根据这些值,隐藏神经元可以使用前一层的误差加权和来更新其权重和其他参数。
可以找到前馈和反向传播步骤的分步演示 这里。
编辑
如果您是神经网络的初学者,您可以从 Perceptron 开始学习,然后进入神经网络,它实际上是一个多层感知器。
Back-propagation works in a logic very similar to that of feed-forward. The difference is the direction of data flow. In the feed-forward step, you have the inputs and the output observed from it. You can propagate the values forward to train the neurons ahead.
In the back-propagation step, you cannot know the errors occurred in every neuron but the ones in the output layer. Calculating the errors of output nodes is straightforward - you can take the difference between the output from the neuron and the actual output for that instance in training set. The neurons in the hidden layers must fix their errors from this. Thus you have to pass the error values back to them. From these values, the hidden neurons can update their weights and other parameters using the weighted sum of errors from the layer ahead.
A step-by-step demo of feed-forward and back-propagation steps can be found here.
Edit
If you're a beginner to neural networks, you can begin learning from Perceptron, then advance to NN, which actually is a multilayer perceptron.
反向传播算法的高级描述
反向传播试图在神经网络的误差表面上进行梯度下降,通过动态规划调整权重em> 保持计算易于处理的技术。
我将尝试用高级术语解释所有刚才提到的概念。
误差面
如果你有一个神经网络,比如说,输出层有 N 个神经元,这意味着你的输出实际上是一个 N 维向量,并且该向量存在于 N 维空间中(或 N 维表面上)。 )您正在训练的“正确”输出也是如此。 “正确”答案与实际输出之间的差异也是如此。
通过适当的条件(特别是对绝对值的一些考虑),这种差异就是存在于误差表面上的误差向量。
梯度下降
有了这个概念,您可以将训练神经网络视为调整神经元权重的过程,以使误差函数很小,理想情况下为零。从概念上讲,您可以使用微积分来完成此操作。如果你只有一个输出和一个权重,这会很简单 - 进行一些导数,这会告诉你要移动的“方向”,并朝该方向进行调整。
但是你没有一个神经元,而是有 N 个神经元,以及更多的输入权重。
原理是相同的,只不过不是在直线上使用微积分来寻找您可以在脑海中想象的斜率,而是将方程变成您无法轻松想象的矢量代数表达式。术语梯度是多维类比,类似于直线上的斜率,而下降表示您想要向下移动 > 误差表面直到误差很小。
动态编程
还有另一个问题 - 如果您有多个层,您将无法轻松看到某些非输出层中的权重与实际输出的变化。
动态规划是一种帮助跟踪正在发生的事情的记账方法。在最高层次上,如果你天真地尝试进行所有这些向量微积分,你最终会一遍又一遍地计算一些导数。现代反向传播算法避免了其中的一些问题,并且碰巧您首先更新输出层,然后更新倒数第二层,依此类推。更新是从输出向后传播,因此得名。
所以,如果你足够幸运,之前接触过梯度下降或向量微积分,那么希望你能明白这一点。
反向传播的完整推导可以浓缩为大约一页的严格符号数学,但如果没有高级描述,就很难理解算法的意义。 (在我看来,这完全是令人生畏的。)如果您没有很好地掌握向量微积分,那么,抱歉,上述内容可能没有帮助。但要让反向传播真正发挥作用,没有必要理解完整的推导过程。
当我试图理解这些材料时,我发现下面的论文(由 Rojas 撰写)非常有帮助,即使它是他书中一章的大 PDF。
http://page.mi.fu-berlin.de/rojas /neural/chapter/K7.pdf
High-level description of the backpropagation algorithm
Backpropagation is trying to do a gradient descent on the error surface of the neural network, adjusting the weights with dynamic programming techniques to keep the computations tractable.
I will try to explain, in high-level terms, all the just mentioned concepts.
Error surface
If you have a neural network with, say, N neurons in the output layer, that means your output is really an N-dimensional vector, and that vector lives in an N-dimensional space (or on an N-dimensional surface.) So does the "correct" output that you're training against. So does the difference between your "correct" answer and the actual output.
That difference, with suitable conditioning (especially some consideration of absolute values) is the error vector, living on the error surface.
Gradient descent
With that concept, you can think of training the neural network as the process of adjusting the weights of your neurons so that the error function is small, ideally zero. Conceptually, you do this with calculus. If you only had one output and one weight, this would be simple -- take a few derivatives, which would tell you which "direction" to move, and make an adjustment in that direction.
But you don't have one neuron, you have N of them, and substantially more input weights.
The principle is the same, except instead of using calculus on lines looking for slopes that you can picture in your head, the equations become vector algebra expressions that you can't easily picture. The term gradient is the multi-dimensional analogue to slope on a line, and descent means you want to move down that error surface until the errors are small.
Dynamic programming
There's another problem, though -- if you have more than one layer, you can't easily see the change of the weights in some non-output layer vs the actual output.
Dynamic programming is a bookkeeping method to help track what's going on. At the very highest level, if you naively try to do all this vector calculus, you end up calculating some derivatives over and over again. The modern backpropagation algorithm avoids some of that, and it so happens that you update the output layer first, then the second to last layer, etc. Updates are propagating backwards from the output, hence the name.
So, if you're lucky enough to have been exposed to gradient descent or vector calculus before, then hopefully that clicked.
The full derivation of backpropagation can be condensed into about a page of tight symbolic math, but it's hard to get the sense of the algorithm without a high-level description. (It's downright intimidating, in my opinion.) If you haven't got a good handle on vector calculus, then, sorry, the above probably wasn't helpful. But to get backpropagation to actually work, it's not necessary to understand the full derivation.
I found the following paper (by Rojas) very helpul, when I was trying to understand this material, even if it's a big PDF of one chapter of his book.
http://page.mi.fu-berlin.de/rojas/neural/chapter/K7.pdf
我将尝试在不深入研究代码或数学的情况下进行解释。
基本上,您可以从神经网络计算分类,并与已知值进行比较。这会导致输出节点出现错误。
现在,从输出节点,我们有 N 个来自其他节点的传入链接。我们将误差传播到输出节点之前的最后一层。然后将其传播到下一层(当有多个上行链路时,您将对错误求和)。然后递归地传播到第一个
为了调整训练的权重,对于每个节点,您基本上执行以下操作:
learningRate 和 alpha 是您可以设置的参数,以调整它磨练解决方案的速度与您解决问题的(希望)准确程度它到底。
I'll try to explain without delving too much into code or math.
Basically, you compute the classification from the neural network, and compare to the known value. This gives you an error at the output node.
Now, from the output node, we have N incoming links from other nodes. We propagate the error to the last layer before the output node. Then propagate it down to the next layer (when there is more than one uplink, you sum the errors). And then recursively propagate to the first
To adjust the weights for training, for each node you basically do the following:
learningRate and alpha are parameters you can set to adjust how quickly it hones in on a solution vs. how (hopefully) accurately you solve it in the end.
如果你看一下计算图,就很容易理解了,它给出了如何通过链式法则(Chain Rule)计算成本函数或损失函数与权重的梯度(这基本上就是反向传播),然后调整权重中的每个权重的机制使用梯度下降的神经网络,其中梯度是通过反向传播计算的梯度。即根据每个权重对最终成本的影响程度,按比例调整每个权重。这里解释的太多了 - 但这里是章节的链接 https:// /alexcpn.github.io/html/NN/ml/4_backpropogation/ 来自我制作 https://alexcpn.github.io/html/NN/
它试图以简单的方式解释这一点。
It is easy to understand if you look at the computation graph which gives how the gradient of the Cost function or Loss function wrto weight is calculated by Chain Rule (which is basically what back propagation is) and then the mechanism of adjusting every weight in the neural network using gradient descent, where the gradient is the one calculated by BackPropogation. That is proportionally adjusting each weight, based on how strong each weight is affecting the final cost. It is too much to explain here -but here is the link to the chapter https://alexcpn.github.io/html/NN/ml/4_backpropogation/ from my book in making https://alexcpn.github.io/html/NN/
which tries to explain this in a simple way.