决策树的功能选择
我应该使用DCISION树执行数据集的功能选择(自变量:患者的某些方面,目标Varibale:患者生病)。之后,随着选定的功能,我已经实现了不同的ML模型。
我的疑问是:当我实施DeDison树时,是否有必要拥有火车和测试集,或者只是在整个数据上适合模型?
I'm supposed to perform feature selection of my dataset (independent variables: some aspects of a patient, target varibale: patient ill or not) using a dcision tree. After that with the features selected I've to implement a different ML model.
My doubt is: when I'm implementing the decison tree is it necessary having a train and a test set or just fit the model on the whole data?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
有必要将数据集拆分为火车测试,因为否则您将使用培训中使用的数据来测量性能,最终可能会变成过度拟合。
过度拟合是训练误差不断减小的地方,但是概括误差增加,其中通过概括误差作为模型正确对新样本进行正确分类(从未见过)的能力。
it's necessary to split the dataset into train-test because otherwise you will measure the performance with data used in training and could end up into over-fitting.
Over-fitting is where the training error constantly decrease but the generalization error increase, where by generalization error is intended as the ability of the model to classify correctly new (never seen before) samples.