神经网络反向传播?

发布于 2024-08-18 04:42:50 字数 573 浏览 5 评论 0原文

谁能推荐一个网站或简要介绍一下反向传播如何在神经网络中实现?我理解基本概念,但不确定如何编写代码。

我发现的许多资料来源只是简单地显示方程式,而没有给出任何解释为什么这样做,而且变量名称使得很难找到。

示例:

void bpnn_output_error(delta, target, output, nj, err)
double *delta, *target, *output, *err;
int nj;
{
  int j;
  double o, t, errsum;

  errsum = 0.0;
  for (j = 1; j <= nj; j++) {
    o = output[j];
    t = target[j];
    delta[j] = o * (1.0 - o) * (t - o);
    errsum += ABS(delta[j]);
  }
  *err = errsum;

}

在该示例中,有人可以解释一下“谢谢”的目的吗

delta[j] = o * (1.0 - o) * (t - o);

Can anyone recommend a website or give me a brief of how backpropagation is implemented in a NN? I understand the basic concept, but I'm unsure of how to go about writing the code.

Many of sources I've found simply show equations without giving any explanation of why they're doing it, and the variable names make it difficult to find out.

Example:

void bpnn_output_error(delta, target, output, nj, err)
double *delta, *target, *output, *err;
int nj;
{
  int j;
  double o, t, errsum;

  errsum = 0.0;
  for (j = 1; j <= nj; j++) {
    o = output[j];
    t = target[j];
    delta[j] = o * (1.0 - o) * (t - o);
    errsum += ABS(delta[j]);
  }
  *err = errsum;

}

In that example, can someone explain the purpose of

delta[j] = o * (1.0 - o) * (t - o);

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

雨后彩虹 2024-08-25 04:42:50

目的

delta[j] = o * (1.0 - o) * (t - o);

是找到反向传播网络中输出节点的误差。

o 表示节点的输出,t 是节点输出的期望值。

术语 (o * (1.0 - o)) 是常用传递函数 sigmoid 函数的导数。(其他传递函数并不罕见,需要重写具有 sigmoid 一阶导数的代码。函数和导数之间的不匹配可能意味着训练不会收敛。)节点有一个“激活”值,该值通过传递函数馈送以获得输出 o,例如

o = f(激活)

最主要的是反向传播使用梯度下降,并且误差通过应用链式法则向后传播。问题是信用分配问题之一,或者如果你愿意的话,归咎于其输出与预期值无法直接比较的隐藏节点。我们从已知且可比较的输出节点开始。该误差与输出的一阶导数乘以预期输出和实际输出之间的原始误差值成正比。

更具象征意义的是,我们将该行写为

delta[j] = f'(activation_j) * (t_j - o_j)

其中 f 是您的传递函数,f' 是它的一阶导数。

在隐藏层的更深处,节点的误差是其对下一层发现的误差的估计贡献。因此,后续层的增量乘以连接权重,然后将这些乘积相加。该总和乘以隐藏节点激活的一阶导数以获得隐藏节点的增量,或者

delta[j] = f'(activation_j) * Sum(delta[k] * w_jk)

其中 j 现在引用后续层中的隐藏节点和 ka 节点。

The purpose of

delta[j] = o * (1.0 - o) * (t - o);

is to find the error of an output node in a backpropagation network.

o represents the output of the node, t is the expected value of output for the node.

The term, (o * (1.0 - o), is the derivative of a common transfer function used, the sigmoid function. (Other transfer functions are not uncommon, and would require a rewrite of the code that has the sigmoid first derivative instead. A mismatch between function and derivative would likely mean that training would not converge.) The node has an "activation" value that is fed through a transfer function to obtain the output o, like

o = f(activation)

The main thing is that backpropagation uses gradient descent, and the error gets backward-propagated by application of the Chain Rule. The problem is one of credit assignment, or blame if you will, for the hidden nodes whose output is not directly comparable to the expected value. We start with what is known and comparable, the output nodes. The error is taken to be proportional to the first derivative of the output times the raw error value between the expected output and actual output.

So more symbolically, we'd write that line as

delta[j] = f'(activation_j) * (t_j - o_j)

where f is your transfer function, and f' is the first derivative of it.

Further back in the hidden layers, the error at a node is its estimated contribution to the errors found at the next layer. So the deltas from the succeeding layer are multiplied by the connecting weights, and those products are summed. That sum is multiplied by the first derivative of the activation of the hidden node to get the delta for a hidden node, or

delta[j] = f'(activation_j) * Sum(delta[k] * w_jk)

where j now references a hidden node and k a node in a succeeding layer.

夜吻♂芭芘 2024-08-25 04:42:50

(to) 是网络输出中的错误,因为 t 是目标输出,o 是实际输出。它以标准化形式存储在 delta 数组中。用于标准化的方法取决于实现,并且 o * ( 1.0 - o ) 似乎正在这样做(我的假设可能是错误的)。

这个归一化误差会针对整个训练集进行累积,以判断训练何时完成:通常是在 errsum 低于某个目标阈值时。

(t-o) is the error in the output of the network since t is the target output and o is the actual output. It is being stored in a normalized form in the delta array. The method used to normalize depends on the implementation and the o * ( 1.0 - o ) seems to be doing that (I could be wrong about that assumption).

This normalized error is accumulated for the entire training set to judge when the training is complete: usually when errsum is below some target threshold.

柠北森屋 2024-08-25 04:42:50

其实,如果你了解理论,程序应该很容易理解。您可以阅读这本书并使用铅笔做一些简单的示例,以找出传播的确切步骤。这是实现数值程序的一般原则,您必须了解小情况的细节。

如果您了解 Matlab,我建议您阅读一些 Matlab 源代码(例如 这里),这比C更容易理解。

对于您问题中的代码,名称非常不言自明,输出可能是您预测的数组,目标可能是数组训练标签中,delta是预测值与真实值之间的误差,它也作为更新到权重向量中的值。

Actually, if you know the theory, the programs should be easy to understand. You can read the book and do some simple samples using a pencil to figure out the exact steps of propagation. This is a general principle for implementing numerical programs, you must know the very details in small cases.

If you know Matlab, I'd suggest you to read some Matlab source code (e.g. here), which is easier to understand than C.

For the code in your question, the names are quite self-explanatory, output may be the array of your prediction, target may be the array of training labels, delta is the error between prediction and true values, it also serves as the value to be updated into the weight vector.

春风十里 2024-08-25 04:42:50

本质上,反向传播的作用是在训练数据上运行网络,观察输出,然后调整节点的值,从输出节点迭代回到输入节点。

Essentially, what backprop does is run the network on the training data, observe the output, then adjust the values of the nodes, going from the output nodes back to the input nodes iteratively.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文