反向传播神经网络中的最优特征实例比
我正在尝试执行留一法交叉验证,以使用反向传播神经网络对特定问题进行建模。我的训练数据中有 8 个特征和 20 个实例。我试图让神经网络学习构建预测模型的功能。现在的问题是,预测的错误率相当高。我的猜测是,与考虑的特征数量相比,训练中的实例数量较少。这个结论是否正确呢?是否存在最佳的特征与实例比例?
I'm trying to perform leave-one-out cross validation for modelling a particular problem using Back Propagation Neural Network. I have 8 features in my training data and 20 instances. I'm trying to make the NN learn a function in building a prediction model. Now, the problem is that the error rate is quite high in the prediction. My guess is that the number of instances in the training is less when compared to the number of features under consideration. Is this conclusion correct. Is there any optimal feature to instance ratio ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
(这个主题在机器学习文献中通常被表述为可接受的数据集大小或形状,因为数据集通常被描述为mx n< /em> 矩阵,其中 m 是行数(数据点),n 是列数(特征);显然 m >> n 是首选。)
在某种情况下,我不知道一般规则可接受的观察特征范围;造成这种情况的原因可能有几个:
这样的比率很大程度上取决于数据的质量
(信噪比);并且
因此,这个问题有两组方法——因为它们是相反的,所以都可以应用于同一个模型:
减少特征数量;
或
使用统计技术来利用您现有的数据
一些建议,针对上述两条路径各一条:
消除“不重要”功能——即那些功能这不会影响响应变量的变异性。主成分分析 (PCA) 是快速可靠的方法,尽管还有许多其他技术通常包含在“降维”标题下。
使用引导方法而不是交叉验证。方法上的差异似乎很小,但多层感知器(神经网络)在减少预测误差方面的(通常是实质性的)改进有据可查(参见 Efron, B. 和 Tibshirani, RJ,引导方法:改进关于交叉验证,美国统计协会杂志,92, 548-560., 1997)。如果您不熟悉用于分割训练和测试数据的 Bootstrap 方法,一般技术与交叉验证类似,只不过不是采用整个数据集的子集,而是采用子样本。 Elements 的 7.11 节很好地介绍了 Bootstrap 方法。
我发现的关于这个一般主题的最佳单一来源是 Hastie 的优秀论文统计学习的元素中的第 7 章模型评估和选择 、提布希拉尼和弗里德曼。本书可从本书的主页免费下载。
(This topic is often phrased in the ML literature as acceptable size or shape of the data set, given that a data set is often described as an m x n matrix in which m is the number of rows (data points) and n is the number of columns (features); obvious m >> n is preferred.)
In an event, I am not aware of a general rule for an acceptable range of features-to-observations; there are probably a couple of reasons for this:
such a ratio would depend strongly on the quality of the data
(signal-to-noise ratio); and
the number of features is just one element of model complexity (e.g., interaction among the features); and model complexity is the strongest determinant of the number of data instances (data points).
So there are two sets of approaches to this problem--which, because they are opposing, both can be applied to the same model:
reduce the number of features; or
use a statistical technique to leverage the data that you do have
A couple of suggestions, one for each of the two paths above:
Eliminate "non-important" features--i.e, those features that don't contribute to the variability in your response variable. Principal Component Analysis (PCA) is fast and reliable way to do this, though there are a number of other techniques which are generally subsumed under the rubric "dimension reduction."
Use Bootstrap methods instead of cross-validation. The difference in methodology seems slight but the (often substantial) improvement in reducing prediction error is well documented for multi-layer perceptrons (neural networks) (see e.g., Efron, B. and Tibshirani, R.J., The bootstrap method: Improvements on cross-validation, J. of the American Statistical Association, 92, 548-560., 1997). If you are not familiar with Bootstrap methods for splitting training and testing data, the general technique is similar to cross-validation except that instead of taking subsets of the entire data set you take subsamples. Section 7.11 of Elements is a good introduction to Bootstrap methods.
The best single source on this general topic that i have found is Chapter 7 Model Assessment and Selection from the excellent treatise Elements of Statistical Learning by Hastie, Tibshirani, and Friedman. This book is available free to download from the book's homepage.