神经网络反向传播?
谁能推荐一个网站或简要介绍一下反向传播如何在神经网络中实现?我理解基本概念,但不确定如何编写代码。
我发现的许多资料来源只是简单地显示方程式,而没有给出任何解释为什么这样做,而且变量名称使得很难找到。
示例:
void bpnn_output_error(delta, target, output, nj, err)
double *delta, *target, *output, *err;
int nj;
{
int j;
double o, t, errsum;
errsum = 0.0;
for (j = 1; j <= nj; j++) {
o = output[j];
t = target[j];
delta[j] = o * (1.0 - o) * (t - o);
errsum += ABS(delta[j]);
}
*err = errsum;
}
在该示例中,有人可以解释一下“谢谢”的目的吗
delta[j] = o * (1.0 - o) * (t - o);
?
Can anyone recommend a website or give me a brief of how backpropagation is implemented in a NN? I understand the basic concept, but I'm unsure of how to go about writing the code.
Many of sources I've found simply show equations without giving any explanation of why they're doing it, and the variable names make it difficult to find out.
Example:
void bpnn_output_error(delta, target, output, nj, err)
double *delta, *target, *output, *err;
int nj;
{
int j;
double o, t, errsum;
errsum = 0.0;
for (j = 1; j <= nj; j++) {
o = output[j];
t = target[j];
delta[j] = o * (1.0 - o) * (t - o);
errsum += ABS(delta[j]);
}
*err = errsum;
}
In that example, can someone explain the purpose of
delta[j] = o * (1.0 - o) * (t - o);
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
目的
是找到反向传播网络中输出节点的误差。
o 表示节点的输出,t 是节点输出的期望值。
术语 (o * (1.0 - o)) 是常用传递函数 sigmoid 函数的导数。(其他传递函数并不罕见,需要重写具有 sigmoid 一阶导数的代码。函数和导数之间的不匹配可能意味着训练不会收敛。)节点有一个“激活”值,该值通过传递函数馈送以获得输出 o,例如
最主要的是反向传播使用梯度下降,并且误差通过应用链式法则向后传播。问题是信用分配问题之一,或者如果你愿意的话,归咎于其输出与预期值无法直接比较的隐藏节点。我们从已知且可比较的输出节点开始。该误差与输出的一阶导数乘以预期输出和实际输出之间的原始误差值成正比。
更具象征意义的是,我们将该行写为
其中 f 是您的传递函数,f' 是它的一阶导数。
在隐藏层的更深处,节点的误差是其对下一层发现的误差的估计贡献。因此,后续层的增量乘以连接权重,然后将这些乘积相加。该总和乘以隐藏节点激活的一阶导数以获得隐藏节点的增量,或者
其中 j 现在引用后续层中的隐藏节点和 ka 节点。
The purpose of
is to find the error of an output node in a backpropagation network.
o represents the output of the node, t is the expected value of output for the node.
The term, (o * (1.0 - o), is the derivative of a common transfer function used, the sigmoid function. (Other transfer functions are not uncommon, and would require a rewrite of the code that has the sigmoid first derivative instead. A mismatch between function and derivative would likely mean that training would not converge.) The node has an "activation" value that is fed through a transfer function to obtain the output o, like
The main thing is that backpropagation uses gradient descent, and the error gets backward-propagated by application of the Chain Rule. The problem is one of credit assignment, or blame if you will, for the hidden nodes whose output is not directly comparable to the expected value. We start with what is known and comparable, the output nodes. The error is taken to be proportional to the first derivative of the output times the raw error value between the expected output and actual output.
So more symbolically, we'd write that line as
where f is your transfer function, and f' is the first derivative of it.
Further back in the hidden layers, the error at a node is its estimated contribution to the errors found at the next layer. So the deltas from the succeeding layer are multiplied by the connecting weights, and those products are summed. That sum is multiplied by the first derivative of the activation of the hidden node to get the delta for a hidden node, or
where j now references a hidden node and k a node in a succeeding layer.
(to)
是网络输出中的错误,因为t
是目标输出,o
是实际输出。它以标准化形式存储在 delta 数组中。用于标准化的方法取决于实现,并且o * ( 1.0 - o )
似乎正在这样做(我的假设可能是错误的)。这个归一化误差会针对整个训练集进行累积,以判断训练何时完成:通常是在 errsum 低于某个目标阈值时。
(t-o)
is the error in the output of the network sincet
is the target output ando
is the actual output. It is being stored in a normalized form in thedelta
array. The method used to normalize depends on the implementation and theo * ( 1.0 - o )
seems to be doing that (I could be wrong about that assumption).This normalized error is accumulated for the entire training set to judge when the training is complete: usually when
errsum
is below some target threshold.其实,如果你了解理论,程序应该很容易理解。您可以阅读这本书并使用铅笔做一些简单的示例,以找出传播的确切步骤。这是实现数值程序的一般原则,您必须了解小情况的细节。
如果您了解 Matlab,我建议您阅读一些 Matlab 源代码(例如 这里),这比C更容易理解。
对于您问题中的代码,名称非常不言自明,输出可能是您预测的数组,目标可能是数组训练标签中,delta是预测值与真实值之间的误差,它也作为更新到权重向量中的值。
Actually, if you know the theory, the programs should be easy to understand. You can read the book and do some simple samples using a pencil to figure out the exact steps of propagation. This is a general principle for implementing numerical programs, you must know the very details in small cases.
If you know Matlab, I'd suggest you to read some Matlab source code (e.g. here), which is easier to understand than C.
For the code in your question, the names are quite self-explanatory, output may be the array of your prediction, target may be the array of training labels, delta is the error between prediction and true values, it also serves as the value to be updated into the weight vector.
本质上,反向传播的作用是在训练数据上运行网络,观察输出,然后调整节点的值,从输出节点迭代回到输入节点。
Essentially, what backprop does is run the network on the training data, observe the output, then adjust the values of the nodes, going from the output nodes back to the input nodes iteratively.