使用 rpart 在回归树中搜索相应节点
我对 R 还很陌生,并且遇到了一个非常愚蠢的问题。
我正在使用 rpart 包校准回归树,以便进行一些分类和一些预测。
借助 R,校准部分很容易完成且易于控制。
#the package rpart is needed
library(rpart)
# Loading of a big data file used for calibration
my_data <- read.csv("my_file.csv", sep=",", header=TRUE)
# Regression tree calibration
tree <- rpart(Ratio ~ Attribute1 + Attribute2 + Attribute3 +
Attribute4 + Attribute5,
method="anova", data=my_data,
control=rpart.control(minsplit=100, cp=0.0001))
校准大决策树后,我希望对于给定的数据样本找到一些新数据的相应集群(从而找到预测值)。predict
函数似乎非常适合该需求。
# read validation data
validationData <-read.csv("my_sample.csv", sep=",", header=TRUE)
# search for the probability in the tree
predict <- predict(tree, newdata=validationData, class="prob")
# dump them in a file
write.table(predict, file="dump.txt")
但是,使用 predict
方法,我只能获得新元素的预测比率,而无法找到获取新元素所属的决策树叶的方法。
我认为它应该很容易得到,因为预测方法必须找到该叶子才能返回比率。
,所有参数似乎都返回相同的东西(决策树的目标属性的值)
有几个参数可以通过 class= 参数提供给预测方法,但是对于回归树来说 有人知道如何获取决策树中的相应节点吗?
通过使用path.rpart方法分析节点,它将帮助我理解结果。
I'm pretty new to R and I'm stuck with a pretty dumb problem.
I'm calibrating a regression tree using the rpart package in order to do some classification and some forecasting.
Thanks to R the calibration part is easy to do and easy to control.
#the package rpart is needed
library(rpart)
# Loading of a big data file used for calibration
my_data <- read.csv("my_file.csv", sep=",", header=TRUE)
# Regression tree calibration
tree <- rpart(Ratio ~ Attribute1 + Attribute2 + Attribute3 +
Attribute4 + Attribute5,
method="anova", data=my_data,
control=rpart.control(minsplit=100, cp=0.0001))
After having calibrated a big decision tree, I want, for a given data sample to find the corresponding cluster of some new data (and thus the forecasted value).
The predict
function seems to be perfect for the need.
# read validation data
validationData <-read.csv("my_sample.csv", sep=",", header=TRUE)
# search for the probability in the tree
predict <- predict(tree, newdata=validationData, class="prob")
# dump them in a file
write.table(predict, file="dump.txt")
However with the predict
method I just get the forecasted ratio of my new elements, and I can't find a way get the decision tree leaf where my new elements belong.
I think it should be pretty easy to get since the predict method must have found that leaf in order to return the ratio.
There are several parameters that can be given to the predict method through the class=
argument, but for a regression tree all seem to return the same thing (the value of the target attribute of the decision tree)
Does anyone know how to get the corresponding node in the decision tree?
By analyzing the node with the path.rpart
method, it would help me understanding the results.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
不幸的是,本杰明的答案不起作用:
type="vector"
仍然返回预测值。我的解决方案相当笨拙,但我认为没有更好的方法。诀窍是将模型框架中的预测 y 值替换为相应的节点编号。
现在,predict 的输出将是节点编号,而不是预测的 y 值。
(注意:上面的方法在我的情况下有效,其中
tree
是回归树,而不是分类树。在分类树的情况下,您可能需要省略as.numeric 或将其替换为
as.factor
。)Benjamin's answer unfortunately doesn't work:
type="vector"
still returns the predicted values.My solution is pretty klugy, but I don't think there's a better way. The trick is to replace the predicted y values in the model frame with the corresponding node numbers.
Now the output of predict will be node numbers as opposed to predicted y values.
(One note: the above worked in my case where
tree
was a regression tree, not a classification tree. In the case of a classification tree, you probably need to omitas.numeric
or replace it withas.factor
.)您可以使用 partykit 包:
对于您的示例,只需设置
You can use the partykit package:
For your example just set
我认为你想要的是
type="vector"
而不是class="prob"
(我不认为 class 是预测方法的可接受参数),因为rpart 文档中解释:I think what you want is
type="vector"
instead ofclass="prob"
(I don't think class is an accepted parameter of the predict method), as explained in the rpart docs: