神经网络加权

发布于 2024-10-03 00:37:53 字数 225 浏览 16 评论 0原文

最近我研究了反向传播网络并做了一些手动练习。 之后,我提出了一个问题(也许没有意义):遵循两种不同的替换方法有什么重要的吗: 1. 增量训练:一旦知道所有 delta Wij 并在呈现下一个训练向量之前,权重就会立即更新。 2. 批量训练:计算并存储每个样本训练向量的 delta Wij。然而,Delta Wij 不会立即用于更新权重。权重更新是在训练周期结束时完成的。

我用谷歌搜索了一段时间但没有找到任何结果。

Recently I've studied the backpropagation network and have done some manual exercise.
After that, I came up with a question( maybe doesn't make sense): is there any thing important in following two different replacement methods:
1. Incremental Training: weights are immediately updated once all the delta Wij's are known and before presenting the next training vector.
2. Batch Training: delta Wij's are computed and stored for each exemplar training vector. However, the delta Wij's are not immediately used to update the weights. Weight updating is done at the end of a training epoch.

I've googled for a while but haven't found any results.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

流星番茄 2024-10-10 00:37:53

所以你指的是执行梯度下降学习的两种模式。在批处理模式下,权重矩阵的变化在训练数据集的整个呈现过程中累积(一个“时期”);在线训练在呈现包含训练集的每个向量后更新权重。

我相信人们的共识是在线培训更优越,因为它收敛得更快(大多数研究报告在准确性方面没有明显差异)。 (例如,参见 Randall Wilson 和 Tony Martinez,梯度下降学习批量训练的一般低效率神经网络 (2003)。

在线训练收敛速度更快的原因其实际意义在于,您可以使用更大的学习率(因此可以通过训练数据以更少的周期收敛)

。 训练随着训练集的大小而增加,结果是批量训练在每次迭代时使用较大的步长,因此会错过误差空间拓扑中的局部最小值——您的求解器会振荡而不是收敛。

批量 ’(最常用于机器学习教科书等),只要它在可接受的时间限制内收敛,使用它就没有什么问题。同样,性能(分辨率或分类准确性)的差异很小或可以忽略不计。

So what you are referring to is the two modes to perform gradient descent learning. In batch mode, changes to the weight matrix are accumulated over an entire presentation of the training data set (one 'epoch'); online training updates the weight after presentation of each vector comprising the training set.

I believe the consensus is that online training is superior because it converges much faster (most studies report no apparent differences in accuracy). (See e.g., Randall Wilson & Tony Martinez, The General Inefficiency of Batch Training for Gradient Descent Learning, In Neural Networks (2003).

The reason why online training converges faster is that it can follow curves in the error surface over each epoch. The practical significance of this is that you can use a larger learning rate (and therefore converge with fewer cycles through the training data).

Put another way, the accumulated weight change for batch training increases with the size of the training set. The result is that batch training uses large steps at each iteration, and therefore misses local minima in the error space topology--your solver oscillates rather than converges.

Batch training is usually the 'default' (most often used in ML textbooks, etc.) and there's nothing wrong with using it as long as it converges within your acceptable time limits. Again, the difference in performance (resolution, or classification accuracy) is small or negligible.

素染倾城色 2024-10-10 00:37:53

是的,这两种方法之间存在差异。计算出的增量是输入向量和网络权重的函数。如果更改权重,则根据下一个输入向量计算的增量将与未更改权重时不同。

因此,对于第一个输入向量,无论您选择哪种方法,都将计算相同的增量。现在,对于连续方法,网络中的权重将发生变化,而在同步方法中,权重暂时保持不变。当出现第二个输入向量时,两种方法现在都会产生不同的增量,因为两个网络之间的权重不同。

Yes there is a difference between these two methods. The deltas that get computed are a function of the input vector and of the weights of the network. If you change the weights, the deltas that are computed from the next input vector will be different than if you didn't change the weights.

So, for the very first input vector, the same deltas will get computed regardless of the method you choose. Now, for the Successive method, the weights in the network will change, while in the Simultaneous method, the weights will remain the same for now. When the 2nd input vector is presented, both methods will now produce different deltas, since the weights are different between the two networks.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文