训练人工神经网络时,验证数据去哪里?

发布于 2024-08-19 12:52:37 字数 268 浏览 16 评论 0原文

将训练集的一部分用作验证数据的需求很简单,但我不太清楚应该如何以及在训练的哪个阶段将其结合起来?

是在训练结束时(在达到训练数据的最低限度后)吗?如果是的话,如果验证数据出现较大错误怎么办?

是在整个训练过程中(当训练和验证数据的错误都不令人满意时,继续寻找最小值)吗?

无论我尝试什么,当验证集达到一定大小时,网络似乎都很难学习训练和验证(我记得在某处读过,70% 训练 30% 验证是一个常见的比例,我陷入了一个困境)小得多),而当完全用于训练时学习相同的数据没有问题。

The need for having part of the training set used as verification data is straightforward, but I am not really clear on how and at what stage of the training should it be incoperated?

Is it at the end of the training (after reaching a good minimum for the training data)? If so, what should be done if the verification data yeilds a big error?

Is it throughout the training (keep looking for a minimum while errors for both the training and verification data aren't satisfactory)?

No matter what I try it seems that the network is having a trouble to learn both training and verification when the verification set reaches a certain size (I recall reading somewhere that 70% training 30% verification is a common ratio, I get stuck at a much smaller one), while it has no problem to learn the same data when used entirely for training.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

渔村楼浪 2024-08-26 12:52:37

重要的是你的验证集必须没有关于训练的反馈。您可以在验证集上绘制错误率,但训练算法只能使用训练集上的错误率来进行自我纠正。

The important thing is that your verification set must have no feedback on the training. You can plot the error rate on the verification set, but the training algorithm can only use the error rate on the training set to correct itself.

软的没边 2024-08-26 12:52:37

验证数据集主要用于提前停止。

  1. 在测试数据上训练第 i 纪元的网络。令测试误差为 e(t, i)。
  2. 在验证集上评估网络。设其为 e(v, i)。
  3. 如果e(v,i)> e(v, i-1) 停止训练。否则转到 1。

因此,它可以帮助您了解网络何时过度拟合,这意味着它对测试数据的细节建模过多。这个想法是,通过人工神经网络,您希望实现从训练数据到未见过的数据的良好泛化。验证集可帮助您确定何时达到对训练数据过于专业化的程度。

The validation data set is mostly used for early stopping.

  1. Train network for epoch i on test data. Let test eerror be e(t, i).
  2. Evaluate network on validation set. Let that be e(v, i).
  3. If e(v, i) > e(v, i-1) stop training. Else goto 1.

So it helps you to see, when the network overfits, which means that it models the specifics of the test data too much. The idea is that with an ANN, you want to achieve good generalization from training data to unseen data. The validation set helps you to determine, when the point is reached when it specializes too much on the training data.

情徒 2024-08-26 12:52:37

意味着过度训练
我建议在训练期间检查验证集“MSE”
请参阅 FannTool 的过度训练警告系统
http://fanntool.googlecode.com/files/FannTool_Users_Guide.zip

means that Over-Training
i advise check a verification set' MSE during training
see Overtraining Caution System of FannTool
http://fanntool.googlecode.com/files/FannTool_Users_Guide.zip

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文