c4.5 的问题
我正在使用 C4.5 算法(可以找到 这里)
我的名字在这里:
Play, Don't Play.
Sky: Sunny, Cloudy, Rainy.
AirTemp: Warm, Cold.
Humidity: Normal, High.
Wind: Strong, Weak.
Water: Warm, Cool.
Forecast: Same, Change.
我的数据在这里
Sunny, Warm, Normal, Strong, Warm, Same, Play
Sunny, Warm, High, Strong, Warm, Same, Play
Sunny, Warm, High, Strong, Cool, Change, Play
Rainy, Cold, High, Strong, Warm, Change, Don't Play
我从使用命令运行的算法得到的输出
c4.5.exe -f v2 -v 1 > v2.r3
是
C4.5 [release 8] decision tree generator Tue Jan 18 16:41:25 2011
----------------------------------------
Options:
File stem <v2>
Verbosity level 1
Read 4 cases (6 attributes) from v2.data
4 items, total weight 4.0
best attribute Forecast inf 1.000 gain 0.311 val 0.311
Collapse tree for 4 items to leaf Play
Decision Tree:
Play (4.0/1.0)
Play (4.00:1.00/2.19)
Tree saved
Evaluation on training data (4 items):
Before Pruning After Pruning
---------------- ---------------------------
Size Errors Size Errors Estimate
1 1(25.0%) 1 1(25.0%) (54.7%) <<
我的问题是树是基于特征预测的,变成了单节点。我自己遵循了算法的伪代码,最终总是得到一棵使用天空特征来决定是否玩的树。我做错了什么?
我认为我的问题是因为我无法设置修剪置信度。我已经尝试过,但它不接受我的输入,例如
c4.5.exe -f v2 -v 1 -c 0.5 > v2.r3
或
c4.5.exe -f v2 -v 1 -c 50% > v2.r3
不起作用。
I'm using the C4.5 algorithm (can be found here)
My names are here:
Play, Don't Play.
Sky: Sunny, Cloudy, Rainy.
AirTemp: Warm, Cold.
Humidity: Normal, High.
Wind: Strong, Weak.
Water: Warm, Cool.
Forecast: Same, Change.
And my data is here
Sunny, Warm, Normal, Strong, Warm, Same, Play
Sunny, Warm, High, Strong, Warm, Same, Play
Sunny, Warm, High, Strong, Cool, Change, Play
Rainy, Cold, High, Strong, Warm, Change, Don't Play
The output I get from the algorithm which I run with the command
c4.5.exe -f v2 -v 1 > v2.r3
is
C4.5 [release 8] decision tree generator Tue Jan 18 16:41:25 2011
----------------------------------------
Options:
File stem <v2>
Verbosity level 1
Read 4 cases (6 attributes) from v2.data
4 items, total weight 4.0
best attribute Forecast inf 1.000 gain 0.311 val 0.311
Collapse tree for 4 items to leaf Play
Decision Tree:
Play (4.0/1.0)
Play (4.00:1.00/2.19)
Tree saved
Evaluation on training data (4 items):
Before Pruning After Pruning
---------------- ---------------------------
Size Errors Size Errors Estimate
1 1(25.0%) 1 1(25.0%) (54.7%) <<
My problem is that the tree is based on the feature forecast changed into a single node. I followed the pseudo code for the algorithm myself and I always end up with a tree that uses the feature Sky to decide whether to play or not. What am I doing wrong?
I think my problem is because I can't set the pruning confidence level. I've tried it but it won't accept my input, e.g.
c4.5.exe -f v2 -v 1 -c 0.5 > v2.r3
or
c4.5.exe -f v2 -v 1 -c 50% > v2.r3
doesn't work.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可能只想尝试 -c 50 而不是 -c 50%。我不知道为什么它不选择 Airtemp,因为它应该具有最高的信息增益。
我还猜测您正在使用汤姆·米切尔(Tom Mitchell)写的《机器学习》一书。他的书应该有一些例子,尝试一下,看看它们如何比较。
编辑:您是否也在链接的网站上运行了示例,如果是,它们是否匹配?
You might want to try just -c 50 instead of -c 50%. I am not sure why it does not pick Airtemp because it should have the highest information gain.
I am also going to guess you are using the Machine Learning book by Tom Mitchell. His book should have some examples, try those and see how they compare.
Edit : Have you run the examples on the site you linked too and if so have they matched up?