是否存在与“anova”等效的词? (对于 lm)到 rpart 对象?
当使用 R 的 rpart
函数时,我可以轻松地用它拟合模型。例如:
# Classification Tree with rpart
library(rpart)
# grow tree
fit <- rpart(Kyphosis ~ Age + Number + Start,
method="class", data=kyphosis)
printcp(fit) # display the results
plotcp(fit)
summary(fit) # detailed summary of splits
# plot tree
plot(fit, uniform=TRUE,
main="Classification Tree for Kyphosis")
text(fit, use.n=TRUE, all=TRUE, cex=.8)
我的问题是 - 如何衡量三个解释变量(年龄、人数、开始)对模型的“重要性”?
如果这是一个回归模型,我可以查看“anova”F 检验中的 p 值(在有变量和没有变量的 lm
模型之间)。但是,在 lm
上使用“anova”与 rpart
对象的等价性是什么?
(我希望我能弄清楚我的问题)
谢谢。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
当然,方差分析是不可能的,因为方差分析涉及计算响应变量的总变异并将其划分为信息成分(SSA、SSE)。我不明白如何计算像 Kyphosis 这样的分类变量的平方和。
我认为你实际上谈论的是属性选择(或评估)。例如,我会使用
信息增益
度量。我认为这就是用来选择树中每个节点的测试属性的方法,并且选择具有最高信息增益(或最大熵减少)的属性作为当前节点的测试属性。此属性最大限度地减少了对结果分区中的样本进行分类所需的信息。我不知道R中是否有根据信息增益对属性进行排名的方法,但我知道WEKA 并命名为 InfoGainAttributeEval 它通过测量相对于类的信息增益来评估属性的价值。如果您使用
Ranker
作为搜索方法
,则属性将根据其各自的评估进行排名。编辑
我终于找到了一种在 R 中使用 Library
CORElearn
来做到这一点的方法Of course anova would be impossible, as anova involves calculating the total variation in the response variable and partitioning it into informative components (SSA, SSE). I can't see how one could calculate sum of squares for a categorical variable like Kyphosis.
I think that what you actually talking about is Attribute Selection (or evaluation). I would use the
information gain
measure for example. I think that this is what is used to select the test attribute at each node in the tree and the attribute with the highest information gain (or greatest entropy reduction) is chosen as the test attribute for the current node. This attribute minimizes the information needed to classify the samples in the resulting partitions.I am not aware whether there is a method of ranking attributes according to their information gain in R, but I know that there is in WEKA and is named InfoGainAttributeEval It evaluates the worth of an attribute by measuring the information gain with respect to the class. And if you use
Ranker
as theSearch Method
, the attributes are ranked by their individual evaluations.EDIT
I finally found a way to do this in R using Library
CORElearn