关于人工神经网络反向传播算法的问题——更新顺序

发布于 2024-11-07 14:13:05 字数 1168 浏览 1 评论 0原文

大家好，我一直在尝试编写一个与反向传播算法一起使用的人工神经网络。我读过几篇关于它们的论文，但我注意到一些差异。

这似乎是该算法的超级通用格式：

给出输入
获取输出
计算误差
计算权重的变化
重复步骤 3 和 4，直到达到输入级别

但这是问题：显然，权重需要在某个时刻更新。然而，因为我们是反向传播，所以在计算更接近输入层的层的误差时，我们需要使用先前层的权重（我的意思是更接近输出层的层）。但是我们已经计算了靠近输出层的层的权重变化！那么，当我们使用这些权重来计算更接近输入的层的误差时，我们是使用它们的旧值，还是它们的“更新值”？

换句话说，如果我们把更新权重的步骤放在我的超通用算法中，会是：（

立即更新权重）

给出输入
获取输出
计算误差
计算权重的变化
更新这些权重
重复步骤3,4 ,5 直到我们达到输入级别

（使用权重的“旧”值）

给出输入
获取输出
计算误差
计算权重的变化
将这些变化存储在矩阵中，但不要更改这些权重
重复步骤 3,4 ,5 直到我们达到输入级别
使用我们存储的值一次更新权重

我读过的这篇论文，在两个抽象示例（基于图3.3和3.4的示例）中，他们都说使用旧值，而不是立即更新值。然而，在他们的“工作示例 3.1”中，他们使用新值（即使他们说他们使用的是旧值）来计算隐藏层的误差。

另外，在我的书《Introduction to Machine Learning by Ethem Alpaydin》中，虽然有很多抽象的东西我还不明白，但他说“注意第一层权重 delta-w_hj 的变化，利用了因此，我们应该计算两层的变化并利用第二层权重的旧值更新第一层权重，然后更新第二层权重。重量。”

老实说，看起来他们确实犯了一个错误，所有的权重最后都同时更新了，但我想确定一下。我的人工神经网络给了我奇怪的结果，我想确定这不是原因。

有人知道吗？

谢谢！

原文

Hey everyone, I've been trying to get an ANN I coded to work with the backpropagation algorithm. I have read several papers on them, but I'm noticing a few discrepancies.

Here seems to be the super general format of the algorithm:

Give input
Get output
Calculate error
Calculate change in weights
Repeat steps 3 and 4 until we reach the input level

But here's the problem: The weights need to be updated at some point, obviously. However, because we're back propagating, we need to use the weights of previous layers (ones closer to the output layer, I mean) when calculating the error for layers closer to the input layer. But we already calculated the weight changes for the layers closer to the output layer! So, when we use these weights to calculate the error for layers closer to the input, do we use their old values, or their "updated values"?

In other words, if we were to put the the step of updating the weights in my super general algorithm, would it be:

(Updating the weights immediately)

Give input
Get output
Calculate error
Calculate change in weights
Update these weights
Repeat steps 3,4,5 until we reach the input level

(Using the "old" values of the weights)

Give input
Get output
Calculate error
Calculate change in weights
Store these changes in a matrix, but don't change these weights yet
Repeat steps 3,4,5 until we reach the input level
Update the weights all at once using our stored values

In this paper I read, in both abstract examples (the ones based on figures 3.3 and 3.4), they say to use the old values, not to immediately update the values. However, in their "worked example 3.1", they use the new values (even though what they say they're using are the old values) for calculating the error of the hidden layer.

Also, in my book "Introduction to Machine Learning by Ethem Alpaydin", though there is a lot of abstract stuff I don't yet understand, he says "Note that the change in the first-layer weight delta-w_hj, makes use of the second layer weight v_h. Therefore, we should calculate the changes in both layers and update the first-layer weights, making use of the old value of the second-layer weights, then update the second-layer weights."

To be honest, it really seems like they just made a mistake and all the weights are updated simultaneously at the end, but I want to be sure. My ANN is giving me strange results, and I want to be positive that this isn't the cause.

Anyone know?

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

稚然 2024-11-14 14:13:05

据我所知，你应该立即更新权重。反向传播的目的是找到使 ANN 误差最小化的权重，它通过梯度下降来实现。我认为 Wikipedia 页面中的算法描述相当不错。您还可以在 joone 引擎中仔细检查其实现。

回复收藏 0 原文

城歌 2024-11-14 14:13:05

您通常反向传播增量而不是错误。这些增量是根据误差计算出来的，但它们的含义并不相同。一旦获得第 n 层的增量（从输入到输出计数），您可以使用这些增量和第 n 层的权重来计算第 n-1 层（更接近输入的一层）的增量。增量仅对网络的旧状态有意义，对新状态没有意义，因此您应该始终使用旧权重将增量传播回输入。

增量在某种意义上意味着神经网络的每个部分之前对误差的贡献有多大，而不是它在下一步中对误差的贡献有多大（因为你还不知道实际的误差）。

与大多数机器学习技术一样，如果您使用更新的权重，它可能仍然有效，但它可能会收敛得更慢。

回复收藏 0 原文

一杆小烟枪 2024-11-14 14:13:05

如果您只是在单个输入输出对上训练它，我的直觉是立即更新权重，因为梯度不是恒定的。但我不认为你的书只提到了单个输入输出对。通常您会提出一个人工神经网络，因为您有许多来自想要使用人工神经网络建模的函数的输入输出样本。因此，您的循环应该从步骤 1 开始重复，而不是从步骤 3 开始。

如果我们将两个方法标记为 new->online 和 old->offline，那么我们就有两个算法。

当您不知道将看到多少个样本输入输出关系，并且您不介意权重更新方式中的一些随机性时，在线算法很好。
如果您想以最佳方式拟合一组特定的数据，离线算法是很好的选择。为了避免过度拟合数据集中的样本，您可以将其分为训练集和测试集。您使用训练集来更新权重，并使用测试集来衡量您的拟合程度。当测试集上的错误开始增加时，您就完成了。

哪种算法最好取决于使用人工神经网络的目的。由于您谈论训练直到“达到输入水平”，因此我假设您训练直到输出与数据集中的目标值完全相同。在这种情况下，您需要离线算法。如果您正在构建一个双陆棋游戏程序，那么在线算法会更好，因为您拥有无限的数据集。

回复收藏 0 原文