关于人工神经网络反向传播算法的问题——更新顺序

发布于 2024-11-07 14:13:05 字数 1168 浏览 0 评论 0原文

大家好,我一直在尝试编写一个与反向传播算法一起使用的人工神经网络。我读过几篇关于它们的论文,但我注意到一些差异。

这似乎是该算法的超级通用格式:

  1. 给出输入
  2. 获取输出
  3. 计算误差
  4. 计算权重的变化
  5. 重复步骤 3 和 4,直到达到输入级别

但这是问题:显然,权重需要在某个时刻更新。然而,因为我们是反向传播,所以在计算更接近输入层的层的误差时,我们需要使用先前层的权重(我的意思是更接近输出层的层)。但是我们已经计算了靠近输出层的层的权重变化!那么,当我们使用这些权重来计算更接近输入的层的误差时,我们是使用它们的旧值,还是它们的“更新值”?

换句话说,如果我们把更新权重的步骤放在我的超通用算法中,会是:(

立即更新权重)

  1. 给出输入
  2. 获取输出
  3. 计算误差
  4. 计算权重的变化
  5. 更新这些权重
  6. 重复步骤3,4 ,5 直到我们达到输入级别

OR

(使用权重的“旧”值)

  1. 给出输入
  2. 获取输出
  3. 计算误差
  4. 计算权重的变化
  5. 将这些变化存储在矩阵中,但不要更改这些权重
  6. 重复步骤 3,4 ,5 直到我们达到输入级别
  7. 使用我们存储的值一次更新权重

我读过的这篇论文,在两个抽象示例(基于图3.3和3.4的示例)中,他们都说使用旧值,而不是立即更新值。然而,在他们的“工作示例 3.1”中,他们使用新值(即使他们说他们使用的是旧值)来计算隐藏层的误差。

另外,在我的书《Introduction to Machine Learning by Ethem Alpaydin》中,虽然有很多抽象的东西我还不明白,但他说“注意第一层权重 delta-w_hj 的变化,利用了因此,我们应该计算两层的变化并利用第二层权重的旧值更新第一层权重,然后更新第二层权重。重量。”

老实说,看起来他们确实犯了一个错误,所有的权重最后都同时更新了,但我想确定一下。我的人工神经网络给了我奇怪的结果,我想确定这不是原因。

有人知道吗?

谢谢!

Hey everyone, I've been trying to get an ANN I coded to work with the backpropagation algorithm. I have read several papers on them, but I'm noticing a few discrepancies.

Here seems to be the super general format of the algorithm:

  1. Give input
  2. Get output
  3. Calculate error
  4. Calculate change in weights
  5. Repeat steps 3 and 4 until we reach the input level

But here's the problem: The weights need to be updated at some point, obviously. However, because we're back propagating, we need to use the weights of previous layers (ones closer to the output layer, I mean) when calculating the error for layers closer to the input layer. But we already calculated the weight changes for the layers closer to the output layer! So, when we use these weights to calculate the error for layers closer to the input, do we use their old values, or their "updated values"?

In other words, if we were to put the the step of updating the weights in my super general algorithm, would it be:

(Updating the weights immediately)

  1. Give input
  2. Get output
  3. Calculate error
  4. Calculate change in weights
  5. Update these weights
  6. Repeat steps 3,4,5 until we reach the input level

OR

(Using the "old" values of the weights)

  1. Give input
  2. Get output
  3. Calculate error
  4. Calculate change in weights
  5. Store these changes in a matrix, but don't change these weights yet
  6. Repeat steps 3,4,5 until we reach the input level
  7. Update the weights all at once using our stored values

In this paper I read, in both abstract examples (the ones based on figures 3.3 and 3.4), they say to use the old values, not to immediately update the values. However, in their "worked example 3.1", they use the new values (even though what they say they're using are the old values) for calculating the error of the hidden layer.

Also, in my book "Introduction to Machine Learning by Ethem Alpaydin", though there is a lot of abstract stuff I don't yet understand, he says "Note that the change in the first-layer weight delta-w_hj, makes use of the second layer weight v_h. Therefore, we should calculate the changes in both layers and update the first-layer weights, making use of the old value of the second-layer weights, then update the second-layer weights."

To be honest, it really seems like they just made a mistake and all the weights are updated simultaneously at the end, but I want to be sure. My ANN is giving me strange results, and I want to be positive that this isn't the cause.

Anyone know?

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

稚然 2024-11-14 14:13:05

据我所知,你应该立即更新权重。反向传播的目的是找到使 ANN 误差最小化的权重,它通过梯度下降来实现。我认为 Wikipedia 页面中的算法描述相当不错。您还可以在 joone 引擎中仔细检查其实现。

As far as I know, you should update weights immediately. The purpose of back-propagation is to find weights that minimize the error of the ANN, and it does so by doing a gradient descent. I think the algorithm description in the Wikipedia page is quite good. You may also double-check its implementation in the joone engine.

城歌 2024-11-14 14:13:05

您通常反向传播增量而不是错误。这些增量是根据误差计算出来的,但它们的含义并不相同。一旦获得第 n 层的增量(从输入到输出计数),您可以使用这些增量和第 n 层的权重来计算第 n-1 层(更接近输入的一层)的增量。增量仅对网络的旧状态有意义,对新状态没有意义,因此您应该始终使用旧权重将增量传播回输入。

增量在某种意义上意味着神经网络的每个部分之前对误差的贡献有多大,而不是它在下一步中对误差的贡献有多大(因为你还不知道实际的误差) 。

与大多数机器学习技术一样,如果您使用更新的权重,它可能仍然有效,但它可能会收敛得更慢。

You are usually backpropagating deltas not errors. These deltas are calculated from the errors, but they do not mean the same thing. Once you have the deltas for layer n (counting from input to output) you use these deltas and the weigths from the layer n to calculate the deltas for layer n-1 (one closer to input). The deltas only have a meaning for the old state of the network, not for the new state, so you should always use the old weights for propagating the deltas back to the input.

Deltas mean in a sense how much each part of the NN has contributed to the error before, not how much it will contribute to the error in the next step (because you do not know the actual error yet).

As with most machine-learning techniques it will probably still work, if you use the updated, weights, but it might converge slower.

一杆小烟枪 2024-11-14 14:13:05

如果您只是在单个输入输出对上训练它,我的直觉是立即更新权重,因为梯度不是恒定的。但我不认为你的书只提到了单个输入输出对。通常您会提出一个人工神经网络,因为您有许多来自想要使用人工神经网络建模的函数的输入输出样本。因此,您的循环应该从步骤 1 开始重复,而不是从步骤 3 开始。

如果我们将两个方法标记为 new->online 和 old->offline,那么我们就有两个算法。

  • 当您不知道将看到多少个样本输入输出关系,并且您不介意权重更新方式中的一些随机性时,在线算法很好。

  • 如果您想以最佳方式拟合一组特定的数据,离线算法是很好的选择。为了避免过度拟合数据集中的样本,您可以将其分为训练集和测试集。您使用训练集来更新权重,并使用测试集来衡量您的拟合程度。当测试集上的错误开始增加时,您就完成了。

哪种算法最好取决于使用人工神经网络的目的。由于您谈论训练直到“达到输入水平”,因此我假设您训练直到输出与数据集中的目标值完全相同。在这种情况下,您需要离线算法。如果您正在构建一个双陆棋游戏程序,那么在线算法会更好,因为您拥有无限的数据集。

If you simply train it on a single input-output pair my intuition would be to update weights immediately, because the gradient is not constant. But I don't think your book mentions only a single input-output pair. Usually you come up with an ANN because you have many input-output samples from a function you would like to model with the ANN. Thus your loops should repeat from step 1 instead of from step 3.

If we label your two methods as new->online and old->offline, then we have two algorithms.

  • The online algorithm is good when you don't know how many sample input-output relations you are going to see, and you don't mind some randomness in they way the weights update.

  • The offline algorithm is good if you want to fit a particular set of data optimally. To avoid overfitting the samples in your data set, you can split it into a training set and a test set. You use the training set to update the weights, and the test set to measure how good a fit you have. When the error on the test set begins to increase, you are done.

Which algorithm is best depends on the purpose of using an ANN. Since you talk about training until you "reach input level", I assume you train until output is exactly as the target value in the data set. In this case the offline algorithm is what you need. If you were building a backgammon playing program, the online algorithm would be a better because you have an unlimited data set.

躲猫猫 2024-11-14 14:13:05

这本书中,作者谈到了反向传播算法的全部意义在于它可以让你高效地一次性计算出所有权重。换句话说,使用“旧值”是有效的。使用新值的计算成本更高,因此这就是人们使用“旧值”来更新权重的原因。

In this book, the author talks about how the whole point of the backpropagation algorithm is that it allows you to efficiently compute all the weights in one go. In other words, using the "old values" is efficient. Using the new values is more computationally expensive, and so that's why people use the "old values" to update the weights.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文