c4.5 的问题

发布于 2024-10-12 14:45:22 字数 1587 浏览 2 评论 0原文

我正在使用 C4.5 算法(可以找到 这里

我的名字在这里:

Play, Don't Play.

Sky: Sunny, Cloudy, Rainy.
AirTemp: Warm, Cold.
Humidity: Normal, High.
Wind: Strong, Weak.
Water: Warm, Cool.
Forecast: Same, Change.

我的数据在这里

Sunny, Warm, Normal, Strong, Warm, Same, Play
Sunny, Warm, High, Strong, Warm, Same, Play
Sunny, Warm, High, Strong, Cool, Change, Play
Rainy, Cold, High, Strong, Warm, Change, Don't Play

我从使用命令运行的算法得到的输出

c4.5.exe -f v2 -v 1  > v2.r3

C4.5 [release 8] decision tree generator Tue Jan 18 16:41:25 2011
----------------------------------------

    Options:
 File stem <v2>
 Verbosity level 1

Read 4 cases (6 attributes) from v2.data

4 items, total weight 4.0
 best attribute Forecast inf 1.000 gain 0.311 val 0.311
Collapse tree for 4 items to leaf Play

Decision Tree:
 Play (4.0/1.0)



Play (4.00:1.00/2.19)

Tree saved


Evaluation on training data (4 items):

  Before Pruning           After Pruning
 ----------------   ---------------------------
 Size      Errors   Size      Errors   Estimate

    1    1(25.0%)      1    1(25.0%)    (54.7%)   <<

我的问题是树是基于特征预测的,变成了单节点。我自己遵循了算法的伪代码,最终总是得到一棵使用天空特征来决定是否玩的树。我做错了什么?

我认为我的问题是因为我无法设置修剪置信度。我已经尝试过,但它不接受我的输入,例如

c4.5.exe -f v2 -v 1 -c 0.5  > v2.r3

c4.5.exe -f v2 -v 1 -c 50%  > v2.r3

不起作用。

I'm using the C4.5 algorithm (can be found here)

My names are here:

Play, Don't Play.

Sky: Sunny, Cloudy, Rainy.
AirTemp: Warm, Cold.
Humidity: Normal, High.
Wind: Strong, Weak.
Water: Warm, Cool.
Forecast: Same, Change.

And my data is here

Sunny, Warm, Normal, Strong, Warm, Same, Play
Sunny, Warm, High, Strong, Warm, Same, Play
Sunny, Warm, High, Strong, Cool, Change, Play
Rainy, Cold, High, Strong, Warm, Change, Don't Play

The output I get from the algorithm which I run with the command

c4.5.exe -f v2 -v 1  > v2.r3

is

C4.5 [release 8] decision tree generator Tue Jan 18 16:41:25 2011
----------------------------------------

    Options:
 File stem <v2>
 Verbosity level 1

Read 4 cases (6 attributes) from v2.data

4 items, total weight 4.0
 best attribute Forecast inf 1.000 gain 0.311 val 0.311
Collapse tree for 4 items to leaf Play

Decision Tree:
 Play (4.0/1.0)



Play (4.00:1.00/2.19)

Tree saved


Evaluation on training data (4 items):

  Before Pruning           After Pruning
 ----------------   ---------------------------
 Size      Errors   Size      Errors   Estimate

    1    1(25.0%)      1    1(25.0%)    (54.7%)   <<

My problem is that the tree is based on the feature forecast changed into a single node. I followed the pseudo code for the algorithm myself and I always end up with a tree that uses the feature Sky to decide whether to play or not. What am I doing wrong?

I think my problem is because I can't set the pruning confidence level. I've tried it but it won't accept my input, e.g.

c4.5.exe -f v2 -v 1 -c 0.5  > v2.r3

or

c4.5.exe -f v2 -v 1 -c 50%  > v2.r3

doesn't work.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

笑脸一如从前 2024-10-19 14:45:22

您可能只想尝试 -c 50 而不是 -c 50%。我不知道为什么它不选择 Airtemp,因为它应该具有最高的信息增益。

我还猜测您正在使用汤姆·米切尔(Tom Mitchell)写的《机器学习》一书。他的书应该有一些例子,尝试一下,看看它们如何比较。

编辑:您是否也在链接的网站上运行了示例,如果是,它们是否匹配?

You might want to try just -c 50 instead of -c 50%. I am not sure why it does not pick Airtemp because it should have the highest information gain.

I am also going to guess you are using the Machine Learning book by Tom Mitchell. His book should have some examples, try those and see how they compare.

Edit : Have you run the examples on the site you linked too and if so have they matched up?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文